thr3ads.net - zfs discuss - [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 [Sep 2006]

If this information is useful, please help other people find it:
Share via:

UNIX admin

2006-Sep-04 08:59 UTC

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

> My question is how efficient will ZFS be, given that
> it will be layered on top of the hardware RAID and
> write cache?
ZFS delivers best performance when used standalone, directly on entire disks. By
using ZFS on top of a HW RAID, you make your data susceptible to HW errors
caused by the storage subsystem''s RAID algorithm, and slow down the
I/O.

You should see much better performance by not creating a HW RAID, then adding
all the disks in the 3320'' enclosures to a ZFS RAIDZ pool.

Additionally, given enough disks, it might be possible to squeeze even better
performance by creating several RAIDZ vdevs and striping them. For a discussion
on this aspect, please see "WHEN TO (AND NOT TO) USE RAID-Z" treatise
at http://blogs.sun.com/roch/entry/when_to_and_not_to.
 
 
This message posted from opensolaris.org

przemolicc at poczta.fm

2006-Sep-04 13:20 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Mon, Sep 04, 2006 at 01:59:53AM -0700, UNIX admin
wrote:> > My question is how efficient will ZFS be, given that
> > it will be layered on top of the hardware RAID and
> > write cache?
> 
> ZFS delivers best performance when used standalone, directly on entire
disks. By using ZFS on top of a HW RAID, you make your data susceptible to HW
errors caused by the storage subsystem''s RAID algorithm, and slow down
the I/O.
> 
> You should see much better performance by not creating a HW RAID, then
adding all the disks in the 3320'' enclosures to a ZFS RAIDZ pool.
This is the case where I don''t understand Sun''s politics at
all: Sun
doesn''t offer really cheap JBOD which can be bought just for ZFS. And
don''t even tell me about 3310/3320 JBODs - they are horrible expansive
:-(

If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap
JBOD.

przemol

Torrey McMahon

2006-Sep-04 19:19 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

UNIX admin wrote:>> My question is how efficient will ZFS be, given that
>> it will be layered on top of the hardware RAID and
>> write cache?
>>     
>
> ZFS delivers best performance when used standalone, directly on entire
disks. By using ZFS on top of a HW RAID, you make your data susceptible to HW
errors caused by the storage subsystem''s RAID algorithm, and slow down
the I/O.
>   

This is simply not true. ZFS would protect against the same type of 
errors seen on an individual drive as it would on a pool made of HW raid 
LUN(s). It might be overkill to layer ZFS on top of a LUN that is 
already protected in some way by the devices internal RAID code but it 
does not "make your data susceptible to HW errors caused by the storage 
subsystem''s RAID algorithm, and slow down the I/O".

True, ZFS can''t manage past the LUN into the array. Guess what? ZFS 
can''t get past the disk drive firmware either....and thats a good thing
for all parties involved.

Peter Sundstrom

2006-Sep-04 20:49 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

Hmm.  Appears to be differing opinions.

Another way of putting my question is can anyone guarantee that ZFS will not
perform worse that UFS on the array?

High speed performance is not really an issue, hence the reason the disks are
mirrored rather than striped.  The client is more concerned with redundancy
(hence the cautious approach of having 3 hot spares).
 
 
This message posted from opensolaris.org

Torrey McMahon

2006-Sep-04 21:18 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

Depends on the workload. (Did I miss that email?)

Peter Sundstrom wrote:> Hmm.  Appears to be differing opinions.
>
> Another way of putting my question is can anyone guarantee that ZFS will
not perform worse that UFS on the array?
>
> High speed performance is not really an issue, hence the reason the disks
are mirrored rather than striped.  The client is more concerned with redundancy
(hence the cautious approach of having 3 hot spares).
>  
>

Wee Yeh Tan

2006-Sep-05 08:58 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

On 9/5/06, Torrey McMahon <Torrey.McMahon at sun.com>
wrote:> This is simply not true. ZFS would protect against the same type of
> errors seen on an individual drive as it would on a pool made of HW raid
> LUN(s). It might be overkill to layer ZFS on top of a LUN that is
> already protected in some way by the devices internal RAID code but it
> does not "make your data susceptible to HW errors caused by the
storage
> subsystem''s RAID algorithm, and slow down the I/O".
& Roch''s recommendation to leave at least 1 layer of redundancy to
ZFS
allows the extension of ZFS''s own redundancy features for some truely
remarkable data reliability.

Perhaps, the question should be how one could mix them to get the best
of both worlds instead of going to either extreme.
> True, ZFS can''t manage past the LUN into the array. Guess what?
ZFS
> can''t get past the disk drive firmware either....and thats a good
thing
> for all parties involved.

-- 
Just me,
Wire ...

Robert Milkowski

2006-Sep-05 10:45 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Hello Wee,

Tuesday, September 5, 2006, 10:58:32 AM, you wrote:

WYT> On 9/5/06, Torrey McMahon <Torrey.McMahon at sun.com>
wrote:>> This is simply not true. ZFS would protect against the same type of
>> errors seen on an individual drive as it would on a pool made of HW
raid
>> LUN(s). It might be overkill to layer ZFS on top of a LUN that is
>> already protected in some way by the devices internal RAID code but it
>> does not "make your data susceptible to HW errors caused by the
storage
>> subsystem''s RAID algorithm, and slow down the I/O".
WYT> & Roch''s recommendation to leave at least 1 layer of
redundancy to ZFS
WYT> allows the extension of ZFS''s own redundancy features for some
truely
WYT> remarkable data reliability.

WYT> Perhaps, the question should be how one could mix them to get the best
WYT> of both worlds instead of going to either extreme.

Depends on your data but sometime it could be useful to create HW RAID
and then do just striping on ZFS side between at least two LUNs. That
way you do not get data protection but fs/pool protection with ditto
block. Of course each LUN is HW RAID made of different physical disks.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Jonathan Edwards

2006-Sep-05 16:42 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

On Sep 5, 2006, at 06:45, Robert Milkowski wrote:
> Hello Wee,
>
> Tuesday, September 5, 2006, 10:58:32 AM, you wrote:
>
> WYT> On 9/5/06, Torrey McMahon <Torrey.McMahon at sun.com> wrote:
>>> This is simply not true. ZFS would protect against the same type of
>>> errors seen on an individual drive as it would on a pool made of  
>>> HW raid
>>> LUN(s). It might be overkill to layer ZFS on top of a LUN that is
>>> already protected in some way by the devices internal RAID code  
>>> but it
>>> does not "make your data susceptible to HW errors caused by
the
>>> storage
>>> subsystem''s RAID algorithm, and slow down the I/O".
>
> WYT> & Roch''s recommendation to leave at least 1 layer of  
> redundancy to ZFS
> WYT> allows the extension of ZFS''s own redundancy features for
some
> truely
> WYT> remarkable data reliability.
>
> WYT> Perhaps, the question should be how one could mix them to get  
> the best
> WYT> of both worlds instead of going to either extreme.
>
> Depends on your data but sometime it could be useful to create HW RAID
> and then do just striping on ZFS side between at least two LUNs. That
> way you do not get data protection but fs/pool protection with ditto
> block. Of course each LUN is HW RAID made of different physical disks.
i remember working up a chart on this list about 2 months ago:

Here''s 10 options I can think of to summarize combinations of zfs  
with hw redundancy:

#   ZFS     ARRAY HW        CAPACITY    COMMENTS
--  ---     --------        --------    --------
1   R0      R1              N/2         hw mirror - no zfs healing (XXX)
2   R0      R5              N-1         hw R5 - no zfs healing (XXX)
3   R1      2 x R0          N/2         flexible, redundant, good perf
4   R1      2 x R5          (N/2)-1     flexible, more redundant,  
decent perf
5   R1      1 x R5          (N-1)/2     parity and mirror on same  
drives (XXX)
6   RZ      R0              N-1         standard RAIDZ - no array  
RAID (XXX)
7   RZ      R1 (tray)       (N/2)-1     RAIDZ+1
8   RZ      R1 (drives)     (N/2)-1     RAID1+Z (highest redundancy)
9   RZ      2 x R5          N-3         triple parity calculations (XXX)
10  RZ      1 x R5          N-2         double parity calculations (XXX)

If you''ve invested in a RAID controller on an array, you might as  
well take advantage of it, otherwise you could probably get an old  
D1000 chassis somewhere and just run RAIDZ on JBOD.

If you''re more concerned about redundancy than space, with the SUN/ 
STK 3000 series dual controller arrays I would either create at least  
2 x RAID5 luns balanced across controllers and zfs mirror, or create  
at least 4 x RAID1 luns balanced across controllers and use RAIDZ.   
RAID0 isn''t going to make that much sense since you''ve got a
128KB
txg commit on zfs which isn''t going to be enough to do a full stripe  
in most cases.

.je
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060905/10f15023/attachment.html>

Torrey McMahon

2006-Sep-05 17:04 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Wee Yeh Tan wrote:>
>
> Perhaps, the question should be how one could mix them to get the best
> of both worlds instead of going to either extreme.
In the specific case of a 3320 I think Jonathan''s chart has a lot of 
good info that can be put to use.

In the general case, well, I hate to say this but it depends. From what 
I''ve seen the general discussions on this list tend toward the
"Make my
small direct connected desktop/server go as fast as possible". Once you 
leave that space and move to the opposite end of the spectrum, a large 
heterogeneous datacenter, you have to start looking at the overall data 
management strategy and how different pieces of technology get 
implemented. (Site to site array replication being a good example.) 
Thats where I think you''ll find more interesting cases where raid
setups
will be used with ZFS on top more then not.

There are also the speed enhancement provided by a HW raid array, and 
usually RAS too,  compared to a native disk drive but the numbers on 
that are still coming in and being analyzed. (See previous threads.)

-- 
Torrey McMahon
Sun Microsystems Inc.

Richard Elling - PAE

2006-Sep-05 17:06 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Jonathan Edwards wrote:> Here''s 10 options I can think of to summarize combinations of zfs
with
> hw redundancy:
> 
> #   ZFS     ARRAY HW        CAPACITY    COMMENTS
> --  ---     --------        --------    --------
> 1   R0      R1              N/2         hw mirror - no zfs healing (XXX)
> 2   R0      R5              N-1         hw R5 - no zfs healing (XXX)
> 3   R1      2 x R0          N/2         flexible, redundant, good perf
> 4   R1      2 x R5          (N/2)-1     flexible, more redundant, decent
perf
> 5   R1      1 x R5          (N-1)/2     parity and mirror on same drives
(XXX)
> 6   RZ      R0              N-1         standard RAIDZ - no array RAID
(XXX)
> 7   RZ      R1 (tray)       (N/2)-1     RAIDZ+1
> 8   RZ      R1 (drives)     (N/2)-1     RAID1+Z (highest redundancy)
> 9   RZ      2 x R5          N-3         triple parity calculations (XXX)
> 10  RZ      1 x R5          N-2         double parity calculations (XXX)
> 
> If you''ve invested in a RAID controller on an array, you might as
well
> take advantage of it, otherwise you could probably get an old D1000 
> chassis somewhere and just run RAIDZ on JBOD.  
I think it would be good if RAIDoptimizer could be expanded to show these
cases, too.  Right now, the availability and performance models are simple.
To go to this level, the models get more complex and there are many more
tunables.  However, for a few representative cases, it might make sense to
do deep analysis, even if that analysis does not get translated into a
tool directly.  We have the tools to do the deep analysis, but the models
will need to be written and verified.  That said, does anyone want to see
this sort of analysis?  If so, what configurations should we do first (keep
in mind that each config may take a few hours, maybe more depending on the
performance model)
  -- richard

Roch - PAE

2006-Sep-06 13:38 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Wee Yeh Tan writes:
 > On 9/5/06, Torrey McMahon <Torrey.McMahon at sun.com> wrote:
 > > This is simply not true. ZFS would protect against the same type of
 > > errors seen on an individual drive as it would on a pool made of HW
raid
 > > LUN(s). It might be overkill to layer ZFS on top of a LUN that is
 > > already protected in some way by the devices internal RAID code but
it
 > > does not "make your data susceptible to HW errors caused by the
storage
 > > subsystem''s RAID algorithm, and slow down the I/O".
 > 
 > & Roch''s recommendation to leave at least 1 layer of
redundancy to ZFS
 > allows the extension of ZFS''s own redundancy features for some
truely
 > remarkable data reliability.
 > 
 > Perhaps, the question should be how one could mix them to get the best
 > of both worlds instead of going to either extreme.
 > 
 > > True, ZFS can''t manage past the LUN into the array. Guess
what? ZFS
 > > can''t get past the disk drive firmware either....and thats a
good thing
 > > for all parties involved.
 > 
 > 
 > -- 
 > Just me,
 > Wire ...
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Thinking some more about this. If your requirements does
mandate some form of mirroring, then it truly seems that ZFS 
should take that in charge if only because of the
self-healing characteristics. So I feel the storage array''s
job is to export low latency Luns to ZFS.

I''d be happy to live with those simple Luns but I guess some
storage will just  refuse to export non-protected  luns. Now
we can definitively take advantage of the Array''s capability
of exporting highly resilient Luns;  RAID-5 seems to fit the
bill  rather   well here. Even  an 9+1   luns will  be quite
resilient and have a low block overhead.

So we benefit from the arrays resiliency as well as it''s low
latency characteristics. And we mirror data at the ZFS level 
which means great performance and great data integrity and
great availability.

Note that ZFS  write characteristics (all  sequential) means
that  we will commonly be filling  full  stripes on the luns
thus avoiding the partial stripe performance pitfall.

If you must shy away from any form of mirroring, then it''s
either stripe your raid-5 luns (performance edge for those
who live dangerously) or raid-z around those raid-5 luns
(lower cost, survives lun failures).

-r

Torrey McMahon

2006-Sep-06 21:58 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Roch - PAE wrote:> Thinking some more about this. If your requirements does
> mandate some form of mirroring, then it truly seems that ZFS 
> should take that in charge if only because of the
> self-healing characteristics. So I feel the storage array''s
> job is to export low latency Luns to ZFS.
>   

The hard part is getting a set of simple requirements. As you go into 
more complex data center environments you get hit with older Solaris 
revs, other OSs, SOX compliance issues, etc. etc. etc. The world where 
most of us seem to be playing with ZFS is on the lower end of the 
complexity scale. Sure, throw your desktop some fast SATA drives. No 
problem. Oh wait, you''ve got ten Oracle DBs on three E25Ks that need to
be backed up every other blue moon ...

I agree with the general idea that an array, be it one disk or some raid 
combination, should simply export low latency LUNs. However, its the 
features offered by the array - Like site to site replication - used to 
meet more complex requirements that literally slow things down. In many 
cases you''ll see years old operational procedures causing those low 
latency LUNs to slow down even more. Something really hard to get a 
customer to undo because a new fangled file system is out. ;)
> I''d be happy to live with those simple Luns but I guess some
> storage will just  refuse to export non-protected  luns. Now
> we can definitively take advantage of the Array''s capability
> of exporting highly resilient Luns;  RAID-5 seems to fit the
> bill  rather   well here. Even  an 9+1   luns will  be quite
> resilient and have a low block overhead.
>   

I think 99x0 used to do 3+1 only. Now it''s 7+1 if I recall. Close
enough
I suppose.> So we benefit from the arrays resiliency as well as it''s low
> latency characteristics. And we mirror data at the ZFS level 
> which means great performance and great data integrity and
> great availability.
>
> Note that ZFS  write characteristics (all  sequential) means
> that  we will commonly be filling  full  stripes on the luns
> thus avoiding the partial stripe performance pitfall.

One thing comes to mind in that case. Many arrays do sequential detect 
on the blocks that come in to the front end ports.
 If things get split up to much or out of order or <insert some strange 
array characteristic here> then you could induce more latency as the 
array does cartwheels trying to figure out whats going on.

Nicolas Dorfsman

2006-Sep-07 08:15 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

> The hard part is getting a set of simple
> requirements. As you go into 
> more complex data center environments you get hit
> with older Solaris 
> revs, other OSs, SOX compliance issues, etc. etc.
> etc. The world where 
> most of us seem to be playing with ZFS is on the
> lower end of the 
> complexity scale. Sure, throw your desktop some fast
> SATA drives. No 
> problem. Oh wait, you''ve got ten Oracle DBs on three
> E25Ks that need to 
> be backed up every other blue moon ...
  Another fact is CPU use.

  Does anybody really know what will be effects of intensive CPU workload on ZFS
perfs, and effects of ZFS RAID CPU compute on intensive CPU workload ?

  I heard a story about a customer complaining about his higend server
performances; when a guy came on site...and discover beautiful SVM RAID-5
volumes, the solution was almost found.

 Nicolas
 
 
This message posted from opensolaris.org

Torrey McMahon

2006-Sep-07 17:22 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Nicolas Dorfsman wrote:>> The hard part is getting a set of simple
>> requirements. As you go into 
>> more complex data center environments you get hit
>> with older Solaris 
>> revs, other OSs, SOX compliance issues, etc. etc.
>> etc. The world where 
>> most of us seem to be playing with ZFS is on the
>> lower end of the 
>> complexity scale. Sure, throw your desktop some fast
>> SATA drives. No 
>> problem. Oh wait, you''ve got ten Oracle DBs on three
>> E25Ks that need to 
>> be backed up every other blue moon ...
>>     
>
>   Another fact is CPU use.
>
>   Does anybody really know what will be effects of intensive CPU workload
on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU workload ?
>
>   I heard a story about a customer complaining about his higend server
performances; when a guy came on site...and discover beautiful SVM RAID-5
volumes, the solution was almost found.
>   
Raid calculations take CPU time but I haven''t seen numbers on ZFS
usage.
SVM is known for using a fair bit of CPU when performing R5 calculations 
and I''m sure other OS have the same issue. EMC used to go around saying
that offloading raid calculations to their storage arrays would increase 
application performance because you would free up CPU time to do other 
stuff. The "EMC effect" is how they used to market it.

Richard Elling - PAE

2006-Sep-07 18:07 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Torrey McMahon wrote:> Raid calculations take CPU time but I haven''t seen numbers on ZFS
usage.
> SVM is known for using a fair bit of CPU when performing R5 calculations 
> and I''m sure other OS have the same issue. EMC used to go around
saying
> that offloading raid calculations to their storage arrays would increase 
> application performance because you would free up CPU time to do other 
> stuff. The "EMC effect" is how they used to market it.
In all modern processors, and most ancient processors, XOR takes 1 CPU
cycle and is easily pipelined.  Getting the data from the disk to the registers
takes thousands or hundreds of thousands of CPU cycles.  You will more likely
feel the latency of the read-modify-write for RAID-5 than the CPU time needed
for XOR.  ZFS avoids the read-modify-write, but does compression, so it is
possible that a few more CPU cycles will be used.  But it should still be a
big win because CPU cycles are less expensive than disk I/O.  Meanwhile, I
think we''re all looking for good data on this.
  -- richard

Peter Rival

2006-Sep-07 18:19 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Richard Elling - PAE wrote:> Torrey McMahon wrote:
>> Raid calculations take CPU time but I haven''t seen numbers on
ZFS
>> usage. SVM is known for using a fair bit of CPU when performing R5 
>> calculations and I''m sure other OS have the same issue. EMC
used to go
>> around saying that offloading raid calculations to their storage 
>> arrays would increase application performance because you would free 
>> up CPU time to do other stuff. The "EMC effect" is how they
used to
>> market it.
> 
> In all modern processors, and most ancient processors, XOR takes 1 CPU
> cycle and is easily pipelined.  Getting the data from the disk to the 
> registers
> takes thousands or hundreds of thousands of CPU cycles.  You will more 
> likely
> feel the latency of the read-modify-write for RAID-5 than the CPU time 
> needed
> for XOR.  ZFS avoids the read-modify-write, but does compression, so it is
> possible that a few more CPU cycles will be used.  But it should still be a
> big win because CPU cycles are less expensive than disk I/O.  Meanwhile, I
> think we''re all looking for good data on this.
>  -- richard
I believe the true answer is (wait for it...) It Depends(TM) on what
you''re limited on.  If your system under your load is CPU constrained,
ZFS calculating the RAIDZ parity (and checksum) is going to hurt; if you are IO
constrained then having the otherwise idle CPU do (which is, of course, more
than just an XOR instruction, but we all know that) the work may help.  The ZFS
design center of mostly-idle CPUs is not always accurate, although most
customers don''t dare push the system to 100% utilization. 
It''s when you _do_ hit that point, or when the extra overhead
unexpectedly makes you hit or go beyond that point that things can get
interesting quickly.

 - Pete

James Dickens

2006-Sep-07 18:58 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

On 9/7/06, Torrey McMahon <Torrey.McMahon at sun.com>
wrote:> Nicolas Dorfsman wrote:
> >> The hard part is getting a set of simple
> >> requirements. As you go into
> >> more complex data center environments you get hit
> >> with older Solaris
> >> revs, other OSs, SOX compliance issues, etc. etc.
> >> etc. The world where
> >> most of us seem to be playing with ZFS is on the
> >> lower end of the
> >> complexity scale. Sure, throw your desktop some fast
> >> SATA drives. No
> >> problem. Oh wait, you''ve got ten Oracle DBs on three
> >> E25Ks that need to
> >> be backed up every other blue moon ...
> >>
> >
> >   Another fact is CPU use.
> >
> >   Does anybody really know what will be effects of intensive CPU
workload on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU
workload ?
> >with ZFS I have found that memory is a much greater limitation, even
my dual 300mhz u2 has no problem filling 2x 20MB/s scsi channels, even
with compression enabled,  using raidz and 10k rpm 9GB drives, thanks
to its 2GB of ram it does great at everything I throw at it. On the
other hand my blade 1500 ram  512MB with 3x 18GB 10k rpm drives using
2x 40MB/s scsi channels , os is on a 80GB ide drive, has problems
interactively because as soon as you push zfs hard it hogs all the ram
and may take 5 or 10 seconds to get response on xterms while the
machine clears out ram and loads its applications/data back into ram.

James Dickens
uadmin.blogspot.com

> >   I heard a story about a customer complaining about his higend server
performances; when a guy came on site...and discover beautiful SVM RAID-5
volumes, the solution was almost found.
> >
>
> Raid calculations take CPU time but I haven''t seen numbers on ZFS
usage.
> SVM is known for using a fair bit of CPU when performing R5 calculations
> and I''m sure other OS have the same issue. EMC used to go around
saying
> that offloading raid calculations to their storage arrays would increase
> application performance because you would free up CPU time to do other
> stuff. The "EMC effect" is how they used to market it.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Richard Elling - PAE

2006-Sep-07 19:14 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

przemolicc at poczta.fm wrote:> This is the case where I don''t understand Sun''s politics
at all: Sun
> doesn''t offer really cheap JBOD which can be bought just for ZFS.
And
> don''t even tell me about 3310/3320 JBODs - they are horrible
expansive :-(
Yep, multipacks are EOL for some time now -- killed by big disks.  Back when
disks were small, people would buy multipacks to attach to their workstations.
There was a time when none of the workstations had internal disks, but
I''d
be dating myself :-)

For datacenter-class storage, multipacks were not appropriate.  They only
had single-ended SCSI interfaces which have a limited cable budget which
limited their use in racks.  Also, they weren''t designed to be used in
a
rack environment, so they weren''t mechanically appropriate either.  I
suppose
you can still find them on eBay.
> If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap
> JBOD.
I don''t quite see this in my crystal ball.  Rather, I see all of the
SAS/SATA
chipset vendors putting RAID in the chipset.  Basically, you can''t get
a
"dumb" interface anymore, except for fibre channel :-).  In other
words, if
we were to design a system in a chassis with perhaps 8 disks, then we would
also use a controller which does RAID.  So, we''re right back to square
1.
  -- richard

Anton B. Rang

2006-Sep-07 19:25 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

The bigger problem with system utilization for software RAID is the cache, not
the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of
a 2 MB L2 cache. This hurts overall system performance far more than the few
microseconds that XORing the data takes.

(A similar effect occurs with file system buffering, and this is one reason why
direct I/O is attractive for databases ? there?s no pollution of the system
cache.)
 
 
This message posted from opensolaris.org

przemolicc at poczta.fm

2006-Sep-08 08:09 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Thu, Sep 07, 2006 at 12:14:20PM -0700, Richard Elling - PAE
wrote:> przemolicc at poczta.fm wrote:
> >This is the case where I don''t understand Sun''s
politics at all: Sun
> >doesn''t offer really cheap JBOD which can be bought just for
ZFS. And
> >don''t even tell me about 3310/3320 JBODs - they are horrible
expansive :-(
> 
> Yep, multipacks are EOL for some time now -- killed by big disks.  Back
when
> disks were small, people would buy multipacks to attach to their 
> workstations.
> There was a time when none of the workstations had internal disks, but
I''d
> be dating myself :-)
> 
> For datacenter-class storage, multipacks were not appropriate.  They only
> had single-ended SCSI interfaces which have a limited cable budget which
> limited their use in racks.  Also, they weren''t designed to be
used in a
> rack environment, so they weren''t mechanically appropriate either.
I
> suppose
> you can still find them on eBay.
> >If Sun wants ZFS to be absorbed quicker it should have such _really_
cheap
> >JBOD.
> 
> I don''t quite see this in my crystal ball.  Rather, I see all of
the
> SAS/SATA
> chipset vendors putting RAID in the chipset.  Basically, you can''t
get a
> "dumb" interface anymore, except for fibre channel :-).  In other
words, if
> we were to design a system in a chassis with perhaps 8 disks, then we would
> also use a controller which does RAID.  So, we''re right back to
square 1.
Richard, when I talk about cheap JBOD I think about home users/small
servers/small companies. I guess you can sell 100 X4500 and at the same
time 1000 (or even more) cheap JBODs to the small companies which for sure
will not buy the big boxes. Yes, I know, you earn more selling
X4500. But what do you think, how Linux found its way to data centers
and become important player in OS space ? Through home users/enthusiasts who
become familiar with it and then started using the familiar things in
their job. 

Proven way to achieve "world domination".  ;-))

przemol

Jim Sloey

2006-Sep-08 08:19 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

> Roch - PAE wrote:
> The hard part is getting a set of simple requirements. As you go into 
> more complex data center environments you get hit with older Solaris 
> revs, other OSs, SOX compliance issues, etc. etc. etc. The world where 
> most of us seem to be playing with ZFS is on the lower end of the 
> complexity scale. I''ve been watching this thread and unfortunately fit this model.
I''d hoped that ZFS might scale enough to solve my problem but you seem
to be saying that it''s mostly untested in large scale environments.
About 7 years ago we ran out of inodes on our UFS file systems. We used bFile as
middleware for a while to distribute the files across multiple disks and then
switched to VFS on SAN about 5 years ago. Distribution across file systems and
inode depletion continued to be a problem so we switched middleware to another
vendor that essentially compresses about 200 files into a single 10Mb archive
and uses a DB to find the file within the archive on the correct disk.
Expensive, complex and slow but effective solution until the latest license
renewal when we got hit with a huge bill.
I''d love to go back to a pure file system model and looked at Reiser4,
JFS, NTFS and now ZFS for a way to support over 100 million small documents and
16Tb. We average 2 file reads and 1 file write per second 24/7 with expected
growth to 24Tb. I''d be willing to scrap everything we have to find a
non-proprietary long term solution.
ZFS looked like it might provide an answer. Are you saying it''s not
really suitable for this type of application?
 
 
This message posted from opensolaris.org

Roch - PAE

2006-Sep-08 08:30 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Torrey McMahon writes:
 > Nicolas Dorfsman wrote:
 > >> The hard part is getting a set of simple
 > >> requirements. As you go into 
 > >> more complex data center environments you get hit
 > >> with older Solaris 
 > >> revs, other OSs, SOX compliance issues, etc. etc.
 > >> etc. The world where 
 > >> most of us seem to be playing with ZFS is on the
 > >> lower end of the 
 > >> complexity scale. Sure, throw your desktop some fast
 > >> SATA drives. No 
 > >> problem. Oh wait, you''ve got ten Oracle DBs on three
 > >> E25Ks that need to 
 > >> be backed up every other blue moon ...
 > >>     
 > >
 > >   Another fact is CPU use.
 > >
 > >   Does anybody really know what will be effects of intensive CPU
workload on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU
workload ?
 > >
 > >   I heard a story about a customer complaining about his higend
server performances; when a guy came on site...and discover beautiful SVM RAID-5
volumes, the solution was almost found.
 > >   
 > 
 > Raid calculations take CPU time but I haven''t seen numbers on ZFS
usage.
 > SVM is known for using a fair bit of CPU when performing R5 calculations 
 > and I''m sure other OS have the same issue. EMC used to go around
saying
 > that offloading raid calculations to their storage arrays would increase 
 > application performance because you would free up CPU time to do other 
 > stuff. The "EMC effect" is how they used to market it.
 > 
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I just measured quickly that a 1.2Ghz sparc can do [400-500]MB/sec
of    encoding    (time  spent      in   misnamed   function
vdev_raidz_reconstruct)  for   a  3  disk raid-z group. Bigger
groups, should cost more but I''d also expect the cost to
decrease with increase CPU frequency.

Note that, the raidz cost is impacted by this:
	6460622 zio_nowait() doesn''t live up to its name

-r

Darren J Moffat

2006-Sep-08 08:41 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

przemolicc at poczta.fm wrote:> Richard, when I talk about cheap JBOD I think about home users/small
> servers/small companies. I guess you can sell 100 X4500 and at the same
> time 1000 (or even more) cheap JBODs to the small companies which for sure
> will not buy the big boxes. Yes, I know, you earn more selling
> X4500. But what do you think, how Linux found its way to data centers
> and become important player in OS space ? Through home users/enthusiasts
who
> become familiar with it and then started using the familiar things in
> their job. 
But Linux isn''t a hardware vendor and doesn''t make cheap JBOD
or
multipack for the home user.

So I don''t see how we get from "Sun should make cheap home user
JBOD"
(which BTW we don''t really have the channel to sell for anyway) to
"but
Linux dominated this way".


-- 
Darren J Moffat

przemolicc at poczta.fm

2006-Sep-08 09:23 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Fri, Sep 08, 2006 at 09:41:58AM +0100, Darren J Moffat
wrote:> przemolicc at poczta.fm wrote:
> >Richard, when I talk about cheap JBOD I think about home users/small
> >servers/small companies. I guess you can sell 100 X4500 and at the same
> >time 1000 (or even more) cheap JBODs to the small companies which for
sure
> >will not buy the big boxes. Yes, I know, you earn more selling
> >X4500. But what do you think, how Linux found its way to data centers
> >and become important player in OS space ? Through home
users/enthusiasts
> >who
> >become familiar with it and then started using the familiar things in
> >their job. 
> 
> But Linux isn''t a hardware vendor and doesn''t make cheap
JBOD or
> multipack for the home user.
Linux is used as a symbol.
> So I don''t see how we get from "Sun should make cheap home
user JBOD"
> (which BTW we don''t really have the channel to sell for anyway) to
"but
> Linux dominated this way".
"Home user" = tech/geek/enthusiasts who is an admin in job

[ Linux ]
"Home user" is using linux at home and is satisfied with it. He/she
then goes to job and says
"Let''s install/use it on less important servers". He/she (and
management) is again satisfied with it. So lets use it at more important
servers ... etc.

[ ZFS ]
"Home user" is using ZFS (Solaris) at home (remember easiness and even
WEB
interface to ZFS operations !,) to keep photos, musics, etc. and is satisfied
with it.
He/she the goes to his/her job and says "I use for a while a fantastic
filesystem". Lets use it on less important servers". Ok. Later on
"Works ok.
Let''s use on more important ....". Etc...

Yes, I know, a bit naive. But remember that not only Linux spreads this
way but also Solaris as well. I guess most of downloaded Solaris CD/DVD
are for x86. You as a company "attack" at high end/midrange level. Let
users/admins/fans "attack" at lower end level.

przemol

Robert Milkowski

2006-Sep-08 09:41 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Hello James,

Thursday, September 7, 2006, 8:58:10 PM, you wrote:
JD> with ZFS I have found that memory is a much greater limitation, even
JD> my dual 300mhz u2 has no problem filling 2x 20MB/s scsi channels, even
JD> with compression enabled,  using raidz and 10k rpm 9GB drives, thanks
JD> to its 2GB of ram it does great at everything I throw at it. On the
JD> other hand my blade 1500 ram  512MB with 3x 18GB 10k rpm drives using
JD> 2x 40MB/s scsi channels , os is on a 80GB ide drive, has problems
JD> interactively because as soon as you push zfs hard it hogs all the ram
JD> and may take 5 or 10 seconds to get response on xterms while the
JD> machine clears out ram and loads its applications/data back into ram.

IIRC correctly there''s is a bug in SPARC ata driver which when
combined with ZFS expresses itself.

Unless you use only ZFS on those SCSI drives...?


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Roch - PAE

2006-Sep-08 10:02 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

zfs "hogs all the ram" under a sustained heavy write load. This is 
being tracked by:

	6429205 each zpool needs to monitor it''s  throughput and throttle
heavy writers

-r

Roch - PAE

2006-Sep-08 10:11 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

Jim Sloey writes:
 > > Roch - PAE wrote:
 > > The hard part is getting a set of simple requirements. As you go into
 > > more complex data center environments you get hit with older Solaris 
 > > revs, other OSs, SOX compliance issues, etc. etc. etc. The world
where
 > > most of us seem to be playing with ZFS is on the lower end of the 
 > > complexity scale. 
 > 
 > I''ve been watching this thread and unfortunately fit this model.
I''d
 > hoped that ZFS might scale enough to solve my problem but you seem to
 > be saying that it''s mostly untested in large scale environments. 
 > About 7 years ago we ran out of inodes on our UFS file systems. We
 > used bFile as middleware for a while to distribute the files across
 > multiple disks and then switched to VFS on SAN about 5 years
 > ago. Distribution across file systems and inode depletion continued to
 > be a problem so we switched middleware to another vendor that
 > essentially compresses about 200 files into a single 10Mb archive and
 > uses a DB to find the file within the archive on the correct
 > disk. Expensive, complex and slow but effective solution until the
 > latest license renewal when we got hit with a huge bill.  
 > 
 > I''d love to go back to a pure file system model and looked at
Reiser4,
 > JFS, NTFS and now ZFS for a way to support over 100 million small
 > documents and 16Tb. We average 2 file reads and 1 file write per
 > second 24/7 with expected growth to 24Tb. I''d be willing to scrap
 > everything we have to find a non-proprietary long term solution. 
 > ZFS looked like it might provide an answer. Are you saying it''s
not
 > really suitable for this type of application? 
 >  

I don''t think it was the point of the post. I''ve read it to
mean that some customers because of outside consideration
from ZFS have some need to use storage array in ways that
may not allow ZFS to develop it''s full potential. If you
don''t replicate within ZFS, then ZFS will not be able to
heal corrupted blocks. But if you''re storage model allow for 
ZFS replication, then the quote is not aimed at your case.

Are you going to grow to 24TB using a few writes per second ?

-r




 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Al Hopper

2006-Sep-08 12:02 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

On Fri, 8 Sep 2006, Jim Sloey wrote:
> > Roch - PAE wrote:
> > The hard part is getting a set of simple requirements. As you go into
> > more complex data center environments you get hit with older Solaris
> > revs, other OSs, SOX compliance issues, etc. etc. etc. The world where
> > most of us seem to be playing with ZFS is on the lower end of the
> > complexity scale.
... reformatted ..> I''ve been watching this thread and unfortunately fit this model.
I''d
> hoped that ZFS might scale enough to solve my problem but you seem to be
> saying that it''s mostly untested in large scale environments.
About 7
> years ago we ran out of inodes on our UFS file systems. We used bFile as
> middleware for a while to distribute the files across multiple disks and
> then switched to VFS on SAN about 5 years ago. Distribution across file
> systems and inode depletion continued to be a problem so we switched
> middleware to another vendor that essentially compresses about 200 files
> into a single 10Mb archive and uses a DB to find the file within the
> archive on the correct disk. Expensive, complex and slow but effective
> solution until the latest license renewal when we got hit with a huge
> bill.  I''d love to go back to a pure file system model and looked
at
> Reiser4, JFS, NTFS and now ZFS for a way to support over 100 million
> small documents and 16Tb. We average 2 file reads and 1 file write per
> second 24/7 with expected growth to 24Tb. I''d be willing to scrap
> everything we have to find a non-proprietary long term solution. ZFS
> looked like it might provide an answer. Are you saying it''s not
really
> suitable for this type of application?
No - that''s not what he is saying.  Personally I think (from the info
presented) is that ZFS would be a viable long term solution to this
storage headache.  But the neat thing about ZFS, is that, with a spare AMD
based box and, as few as 5 low-cost SATA drives, you can actually try
it[1].

Think about this for a Second.  You can put together a test ZFS box for
less money than you would spend, in man-hours, talking about it as a
_possible_ solution.

[1] 5 to 10 SATA drives won''t get you 16Tb - but it''ll get you
close
enough to model the system with a substantial portion of your dataset.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
                OpenSolaris Governing Board (OGB) Member - Feb 2006

Jim Sloey

2006-Sep-08 13:33 UTC

head link

[zfs-discuss] Re: Re: Re: Recommendation ZFS on StorEdge 3320

rbourbon  writes:> I don''t think it was the point of the post. I''ve read
> it to mean that some customers because of outside
> consideration from ZFS have some need to use storage array in ways
> that may not allow ZFS to develop it''s full potential. 
I''ve been following this thread because we have redundant load balanced
servers, SAN and replication to a disaster recovery site 800 miles away. We will
probably not be able to use ZFS to it''s full potential (especially for
replication) however it does solve our iNode depletion problem and eliminates
middleware. Not trying to hijack the thread, just trying to learn from others
experience before I commit.
 > Are you going to grow to 24TB using a few writes per
> second ?Actually 24Tb (8Tb growth from current capacity) is the low end projection.
60 sec * 60 min * 24 hours * 365 days = 31,536,000 new files/year * 3 yrs till
the next technology refresh/upgrade.
 
 
This message posted from opensolaris.org

Richard Elling - PAE

2006-Sep-08 16:33 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

przemolicc at poczta.fm wrote:>> I don''t quite see this in my crystal ball.  Rather, I see all
of the SAS/SATA
>> chipset vendors putting RAID in the chipset.  Basically, you
can''t get a
>> "dumb" interface anymore, except for fibre channel :-).  In
other words, if
>> we were to design a system in a chassis with perhaps 8 disks, then we
would
>> also use a controller which does RAID.  So, we''re right back
to square 1.
> 
> Richard, when I talk about cheap JBOD I think about home users/small
> servers/small companies. I guess you can sell 100 X4500 and at the same
> time 1000 (or even more) cheap JBODs to the small companies which for sure
> will not buy the big boxes. Yes, I know, you earn more selling
> X4500. But what do you think, how Linux found its way to data centers
> and become important player in OS space ? Through home users/enthusiasts
who
> become familiar with it and then started using the familiar things in
> their job. 
I was looking for a new AM2 socket motherboard a few weeks ago.  All of the ones
I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID.  All were less than
$150.
In other words, the days of having a JBOD-only solution are over except for
single
disk systems.  4x750 GBytes is a *lot* of data (and video).

There has been some recent discussion about eSATA JBODs in the press. 
I''m not
sure they will gain much market share.  iPods and flash drives have a much
larger
market share.
> Proven way to achieve "world domination".  ;-))
Dang!  I was planning to steal a cobalt bomb and hold the world hostage while
I relax in my space station... zero-G whee! :-)
  -- richard

Bill Sommerfeld

2006-Sep-08 18:05 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Fri, 2006-09-08 at 09:33 -0700, Richard Elling - PAE
wrote:> There has been some recent discussion about eSATA JBODs in the press. 
I''m not
> sure they will gain much market share.  iPods and flash drives have a much
larger
> market share.
Dunno about eSATA jbods, but eSATA host ports have appeared on at least
two HDTV-capable DVRs for storage expansion (looks like one model of the
Scientific Atlanta cable box DVR''s as well as on the
shipping-any-day-now Tivo Series 3).  

It''s strange that they didn''t go with firewire since
it''s already widely
used for digital video.  

						- Bill

Ed Gould

2006-Sep-08 18:22 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote:> I was looking for a new AM2 socket motherboard a few weeks ago.  All 
> of the ones
> I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID.  All were 
> less than $150.
> In other words, the days of having a JBOD-only solution are over 
> except for single
> disk systems.  4x750 GBytes is a *lot* of data (and video).
It''s not clear to me that JBOD is dead.  The (S)ATA RAID cards
I''ve
seen are really software RAID solutions that know just enough in the 
controller to let the BIOS boot off a RAID volume.  None of the 
expensive RAID stuff is in the controller.

	--Ed

Torrey McMahon

2006-Sep-08 18:35 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

Ed Gould wrote:> On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote:
>> I was looking for a new AM2 socket motherboard a few weeks ago.  All 
>> of the ones
>> I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID.  All were 
>> less than $150.
>> In other words, the days of having a JBOD-only solution are over 
>> except for single
>> disk systems.  4x750 GBytes is a *lot* of data (and video).
>
> It''s not clear to me that JBOD is dead.  The (S)ATA RAID cards
I''ve
> seen are really software RAID solutions that know just enough in the 
> controller to let the BIOS boot off a RAID volume.  None of the 
> expensive RAID stuff is in the controller.

If I read between the lines here I think you''re saying that the raid 
functionality is in the chipset but the management can only be done by 
software running on the outside. (Right?)

A1000 anyone? :)

Ed Gould

2006-Sep-08 18:40 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Sep 8, 2006, at 11:35, Torrey McMahon wrote:> If I read between the lines here I think you''re saying that the
raid
> functionality is in the chipset but the management can only be done by 
> software running on the outside. (Right?)
No.  All that''s in the chipset is enough to read a RAID volume for 
boot.  Block layout, RAID-5 parity calculations, and the rest are all 
done in the software.  I wouldn''t be surprised if RAID-5 parity 
checking was absent on read for boot, but I don''t actually know.

	--Ed

Jonathan Edwards

2006-Sep-08 18:41 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On Sep 8, 2006, at 14:22, Ed Gould wrote:
> On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote:
>> I was looking for a new AM2 socket motherboard a few weeks ago.   
>> All of the ones
>> I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID.  All  
>> were less than $150.
>> In other words, the days of having a JBOD-only solution are over  
>> except for single
>> disk systems.  4x750 GBytes is a *lot* of data (and video).
>
> It''s not clear to me that JBOD is dead.  The (S)ATA RAID cards
I''ve
> seen are really software RAID solutions that know just enough in  
> the controller to let the BIOS boot off a RAID volume.  None of the  
> expensive RAID stuff is in the controller.
additionally the only RAID many support favor just mirroring and  
striping (RAID 0, 1, 10, etc) not as many do parity.

Bennett, Steve

2006-Sep-08 22:14 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

> Dunno about eSATA jbods, but eSATA host ports have
> appeared on at least two HDTV-capable DVRs for storage
> expansion (looks like one model of the Scientific Atlanta
> cable box DVR''s as well as on the shipping-any-day-now
> Tivo Series 3).  
> 
> It''s strange that they didn''t go with firewire since
it''s
> already widely used for digital video.
Cost? If you use eSata it''s pretty much just a physical connector onto
the board, whereas I guess firewire needs a 1394 interface (couple of
dollars?) plus a royalty to all the patent holders.

It''s probably not much, but I can''t see how there can be *any*
margin in
consumer electronics these days...

Steve.

Richard Elling - PAE

2006-Sep-09 00:59 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

Ed Gould wrote:> On Sep 8, 2006, at 11:35, Torrey McMahon wrote:
>> If I read between the lines here I think you''re saying that
the raid
>> functionality is in the chipset but the management can only be done by 
>> software running on the outside. (Right?)
> 
> No.  All that''s in the chipset is enough to read a RAID volume for
> boot.  Block layout, RAID-5 parity calculations, and the rest are all 
> done in the software.  I wouldn''t be surprised if RAID-5 parity
checking
> was absent on read for boot, but I don''t actually know.
At Sun, we often use the LSI Logic LSISAS1064 series of SAS RAID controllers
on motherboards for many products.  [LSI claims support for Solaris 2.6!]
These controllers  have a builtin microcontroller(ARM 926, IIRC), firmware,
and nonvolatile memory (NVSRAM) for implementing the RAID features.  We manage
them through BIOS, OBP, or raidctl(1m).  As Torrey says, very much like the
A1000.
Some of the fancier LSI products offer RAID 5, too.
  -- richard

Anton B. Rang

2006-Sep-09 03:34 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320 - offtopic

The better SATA RAID cards have hardware support. One site comparing controllers
is:

  http://tweakers.net/reviews/557

Five of the eight controllers they looked at implemented RAID in hardware; one
of the others implemented only the XOR in hardware.  Chips like the Adaptec
AIC-8210 implement multiple SATA ports as well as RAID-5 and RAID-6 and a
microcontroller in a single chip.

JBOD probably isn''t dead, simply because motherboard manufacturers are
unlikely to pay the extra $10 it might cost to use a RAID-enabled chip rather
than a plain chip (and the cost is more if you add cache RAM); but basic RAID is
at least cheap. Of course, having RAID in the HBA is a single point of failure!
 
 
This message posted from opensolaris.org

Richard Elling - PAE

2006-Sep-09 05:28 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320 - offtopic

Anton B. Rang wrote:> JBOD probably isn''t dead, simply because motherboard manufacturers
are unlikely to pay
> the extra $10 it might cost to use a RAID-enabled chip rather than a plain
chip (and
> the cost is more if you add cache RAM); but basic RAID is at least cheap. 
NVidia MCPs (later NForce chipsets) also do RAID.  The NForce 5x0 systems even
do RAID-5 and sparing (with 6 SATA ports).  Using special-purpose RAID chips
won''t be necessary for desktops or low-end systems.  Moore''s
law says that we
can continue to integrate more and more functions onto fewer parts.
> Of course,  having RAID in the HBA is a single point of failure!
At this level, and price point, there are many SPOFs.  Indeed there is
always at least one SPOF.
  -- richard

Frank Cusack

2006-Sep-09 06:11 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

On September 8, 2006 5:59:47 PM -0700 Richard Elling - PAE 
<Richard.Elling at Sun.COM> wrote:> Ed Gould wrote:
>> On Sep 8, 2006, at 11:35, Torrey McMahon wrote:
>>> If I read between the lines here I think you''re saying
that the raid
>>> functionality is in the chipset but the management can only be done
by
>>> software running on the outside. (Right?)
>>
>> No.  All that''s in the chipset is enough to read a RAID volume
for
>> boot.  Block layout, RAID-5 parity calculations, and the rest are all
>> done in the software.  I wouldn''t be surprised if RAID-5
parity checking
>> was absent on read for boot, but I don''t actually know.
>
> At Sun, we often use the LSI Logic LSISAS1064 series of SAS RAID
> controllers
> on motherboards for many products.  [LSI claims support for Solaris 2.6!]
> These controllers  have a builtin microcontroller(ARM 926, IIRC),
> firmware,
> and nonvolatile memory (NVSRAM) for implementing the RAID features.  We
> manage
> them through BIOS, OBP, or raidctl(1m).  As Torrey says, very much like
> the A1000.
> Some of the fancier LSI products offer RAID 5, too.
Yes, some (many) of the RAID controllers do all the RAID in the hardware.
I don''t see where Ed was disputing that.

But there will always be a [large] market for cheaper but less capable
products and so at least for awhile to come there will be these not-quite-
RAID cards.  Probably for a very long while.

winmodem, anyone?

-frank

Frank Cusack

2006-Sep-09 06:32 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

On September 7, 2006 12:25:47 PM -0700 "Anton B. Rang" <Anton.Rang
at Sun.COM>
wrote:> The bigger problem with system utilization for software RAID is the
> cache, not the CPU cycles proper. Simply preparing to write 1 MB of data
> will flush half of a 2 MB L2 cache. This hurts overall system performance
> far more than the few microseconds that XORing the data takes.
Interesting.  So does this in any way invalidate benchmarks recently posted
here which showed raidz on jbod to outperform a zfs stripe on HW raid5?
(That''s my recollection, perhaps it''s a mischaracterization or
just plain
wrong.)  I mean, even if raid-z on jbod in a filesystem benchmark is a
winner, when you have an actual application with a working set that is
more than filesystem data, the benchmark results would be misleading.

Ultimately, you do want to use your actual application as the benchmark,
but certainly generic benchmarks should at least be helpful.

-frank

Anton Rang

2006-Sep-09 14:47 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

On Sep 9, 2006, at 1:32 AM, Frank Cusack wrote:
> On September 7, 2006 12:25:47 PM -0700 "Anton B. Rang"  
> <Anton.Rang at Sun.COM> wrote:
>> The bigger problem with system utilization for software RAID is the
>> cache, not the CPU cycles proper. Simply preparing to write 1 MB  
>> of data
>> will flush half of a 2 MB L2 cache. This hurts overall system  
>> performance
>> far more than the few microseconds that XORing the data takes.
>
> Interesting.  So does this in any way invalidate benchmarks  
> recently posted
> here which showed raidz on jbod to outperform a zfs stripe on HW  
> raid5?
No.  There are, in fact, two reasons why RAID-Z is likely to outperform
hardware RAID 5, at least in certain types of I/O benchmarks.  First,
RAID-5 requires read-modify-write cycles when full stripes aren''t being
written; and ZFS tends to issue small and pretty much random I/O (in my
experience), which is the worst case for RAID-5.  Second, performing  
RAID
on the main CPU is faster, or at least just as fast, as in hardware.

There are also cases where hardware RAID 5 will likely outperform ZFS.
One is when there is a large RAM cache (which is not being flushed by
ZFS -- one issue to be addressed is that the commands ZFS uses to  
control
the write cache on plain disks tend to effectively disable the NVRAM
cache on hardware RAID controllers).  Another is when the I/O bandwidth
being used is near the maximum capacity of the host channel, because
doing software RAID requires moving more data over this channel.  (If
you have sufficient channels to dedicate one per disk, as is the case
with SATA, this doesn''t come into play.)  This is particularly
noticeable during reconstruction, since the channels are being used
both to transfer data & reconstruct it, where in a hardware RAID-5
box (of moderate cost, at least) they are typically overprovisioned.
A third is if the system CPU or memory bandwidth is heavily used by
your application; for instance, a database running under heavy load.
In this case, the added CPU, cache, and memory bandwidth of software
RAID will stress the application.
> Ultimately, you do want to use your actual application as the  
> benchmark,
> but certainly generic benchmarks should at least be helpful.
They''re helpful in measuring what the benchmark measures.  ;-)  If the
benchmark measures how quickly you can get data from host RAM to disk,
which is typically the case, it won''t tell you anything about how much
CPU was used in the process.  Real applications, however, often care.
There''s a reason why we use interrupt-driven controllers, even though
you get better performance of the I/O itself with polling.  :-)

Anton

Roch - PAE

2006-Sep-12 12:40 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Anton B. Rang writes:

   > The bigger  problem with system  utilization for software
  RAID is  the  cache,   not the  CPU  cycles proper.   Simply
  preparing to write 1 MB of data will flush half of a 2 MB L2
  cache. This  hurts overall system  performance far more than
  the few microseconds that XORing the data takes.
   > 

With ZFS,   on most deployments we''ll    bring the data into
cache for the checksums; so I guess that the raid-z cost
will be just incremental.

Now would we gain anything at generating ZFS functions for 
''checksum+parity'',
''checksum+parity+compression'' ?

-r

UNIX admin

2006-Sep-12 18:12 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

> This is simply not true. ZFS would protect against
> the same type of 
> errors seen on an individual drive as it would on a
> pool made of HW raid 
> LUN(s). It might be overkill to layer ZFS on top of a
> LUN that is 
> already protected in some way by the devices internal
> RAID code but it 
> does not "make your data susceptible to HW errors
> caused by the storage 
> subsystem''s RAID algorithm, and slow down the I/O".
I disagree, and vehemently at that. I maintain that if the HW RAID is used, the
chance of data corruption is much higher, and ZFS would have a lot more
repairing to do than it would if it were used directly on disks. Problems with
HW RAID algorithms have been plaguing us for at least 15 years or more. The
venerable Sun StorEdge T3 comes to mind!

Further, while it is perfectly logical to me that doing RAID calculations twice
is slower than doing it once, you maintain that is not the case, perhaps because
one calculation is implemented in FW/HW?

Well, why don''t you simply try it out? Once with both RAID HW and ZFS,
and once with just ZFS directly on the disks?
RAID HW is very likely to have a slower CPU or CPUs than any modern system that
ZFS will be running on. Even if we assume that the HW RAID''s CPU is the
same speed or faster than the CPU in the server, you still have TWICE the amount
of work that has to be performed for every write. Once by the hardware and once
by the software (ZFS). Caches might help some, but I fail to see how double the
amount of work (and hidden, abstracted complexity) would be as fast or faster
than just using ZFS directly on the disks.
 
 
This message posted from opensolaris.org

UNIX admin

2006-Sep-12 18:35 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

> There are also the speed enhancement provided by a HW
> raid array, and 
> usually RAS too,  compared to a native disk drive but
> the numbers on 
> that are still coming in and being analyzed. (See
> previous threads.)
Speed enhancements? What is the baseline of comparison?

Hardware RAIDs can be banalized to two features: cache which does data
reordering for optimal disk writes and parity calculation which is being
offloaded off of the server''s CPU.

But HW calculations still take time, and the in-between, battery backed cache
serves to replace the individual disk caches, because of the traditional file
system approach which had to have some assurance that the data made it to disk
in one way or another.

With ZFS however the in-between cache is obsolete, as individual disk caches can
be used directly. I also openly question whether even the dedicated RAID HW is
faster than the newest CPUs in modern servers.

Unless there is something that I''m missing, I fail to see the benefit
of a HW RAID in tandem with ZFS. In my view, this holds especially true when one
gets into SAN storage like SE6920, EMC and Hitachi products.

Furthermore, need I remind of the buggy SE6920 firmware? I don''t trust
it as far as I can throw it.

Or, lets put it this way: I trust Mr. Bonwick a whole lot more than some
firmware writers.
 
 
This message posted from opensolaris.org

Frank Cusack

2006-Sep-12 19:41 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

On September 12, 2006 11:35:54 AM -0700 UNIX admin <tripivceta at
hotmail.com>
wrote:>> There are also the speed enhancement provided by a HW
>> raid array, and
>> usually RAS too,  compared to a native disk drive but
>> the numbers on
>> that are still coming in and being analyzed. (See
>> previous threads.)
It would be nice if you would attribute your quotes.  Maybe this is a
limitation of the web interface?
> Speed enhancements? What is the baseline of comparison?
>
> Hardware RAIDs can be banalized to two features: cache which does data
> reordering for optimal disk writes and parity calculation which is being
> offloaded off of the server''s CPU.
>
> But HW calculations still take time, and the in-between, battery backed
> cache serves to replace the individual disk caches, because of the
> traditional file system approach which had to have some assurance that
> the data made it to disk in one way or another.
>
> With ZFS however the in-between cache is obsolete, as individual disk
> caches can be used directly. I also openly question whether even the
> dedicated RAID HW is faster than the newest CPUs in modern servers.
>
> Unless there is something that I''m missing, I fail to see the
benefit of
> a HW RAID in tandem with ZFS. In my view, this holds especially true when
> one gets into SAN storage like SE6920, EMC and Hitachi products.
I agree with your basic point, that the HW RAID cache is obsoleted by zfs
(which seems to be substantiated here by benchmark results), but I think
you slightly mischaracterize its use.  The speed of the HW RAID CPU is
irrelevant; the parity is XOR which is extremely fast with any CPU when
compared to disk write speed.

What is relevant is, as Anton points out, the CPU cache on the host system.
Parity calculations kill the cache and will hurt memory-intensive apps.
So in this case, offloading it may help in the ufs case.  (Not for zfs,
as I understand from reading here, since checksums still have to be done.
I would argue that this is *absolutely essential* [and zfs obsoletes all
other filesystems] and therefore the gain in the ufs on HW RAID-5 case is
worthless due to the correctness tradeoff.)

It would be interesting to have a zfs enabled HBA to offload the checksum
and parity calculations.  How much of zfs would such an HBA have to
understand?

-frank

Robert Milkowski

2006-Sep-12 20:25 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Hello Frank,

Tuesday, September 12, 2006, 9:41:05 PM, you wrote:

FC> It would be interesting to have a zfs enabled HBA to offload the checksum
FC> and parity calculations.  How much of zfs would such an HBA have to
FC> understand?

That won''t be end-to-end checksuming anymore, right?
That way you can disable ZFS checksuming at all and base only on HW
RAID.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Torrey McMahon

2006-Sep-12 20:56 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

UNIX admin wrote:>> This is simply not true. ZFS would protect against
>> the same type of 
>> errors seen on an individual drive as it would on a
>> pool made of HW raid 
>> LUN(s). It might be overkill to layer ZFS on top of a
>> LUN that is 
>> already protected in some way by the devices internal
>> RAID code but it 
>> does not "make your data susceptible to HW errors
>> caused by the storage 
>> subsystem''s RAID algorithm, and slow down the I/O".
>>     
>
> I disagree, and vehemently at that. I maintain that if the HW RAID is used,
the chance of data corruption is much higher, and ZFS would have a lot more
repairing to do than it would if it were used directly on disks. Problems with
HW RAID algorithms have been plaguing us for at least 15 years or more. The
venerable Sun StorEdge T3 comes to mind!
>   

Please expand on your logic. Remember that ZFS works on top of LUNs. A 
disk drive by itself is a LUN when added to a ZFS pool. A LUN can also 
be comprised of multiple disk drives striped together and presented to a 
host as one logical unit. Or a LUN can be offered by a virtualization 
gateway that in turn imports raid array LUNs that are really made up of 
individual disk drives. Or ... insert a million different ways to get a 
host something called a LUN that allows the host to read and write 
blocks. They could be really slow LUNs because they''re two hamsters 
shuffling zeros and ones back and forth on little wheels. (OK, that 
might be too slow.) Outside of the cache enabling when entire disk 
drives are presented to the pool ZFS doesn''t care what the LUN is made
of.

ZFS reliability features are available and work on top of the LUNs you 
give it and the configuration you use. The type of LUN is 
inconsequential at the ZFS level. If I had 12 LUNS that were single disk 
drives and created a RAIDZ pool it would have the same reliability at 
the ZFS level as if I presented it 12 LUNs that were really quad-mirrors 
from 12 independent hw raid array. You can make argument that the 12 
disk drive config is easier to use or that the overall reliability of 
the 12 quad-mirror LUNs system has a higher reliability but at ZFSs 
point of view it''s the same. Its happily writing blocks, checking 
checksums, reading things from the LUNs, etc. etc. etc.

On top of that disk drives are not some simple beast that just coughs up 
i/o when you want it to. A modern disk drive does all sorts of stuff 
under the covers to speed up i/o and - surprise - increase the 
reliability of the drive as much as possible. If you think you''re
really
writing "straight to disk" you''re not. Cache, ZBR, bad block 
re-allocation, all come into play.

As for problems with specific raid arrays, including the T3, you are 
preaching to the choir but I''m definitely not going to get into a 
pissing contest over specific components having more or less bugs then 
an other.
> Further, while it is perfectly logical to me that doing RAID calculations
twice is slower than doing it once, you maintain that is not the case, perhaps
because one calculation is implemented in FW/HW?
>   
As the man says, "It depends". A really fast raid array might be 
responding to i/o requests faster then a single disk drive. It might not 
given the nature of the i/o coming in.

Don''t think of it in terms of RAID calculations taking a certain amount
of time. Think of it in terms of having to meet a specific amount of 
requirements to manage your data. I''ll be the first to say that if 
you''re going to be putting ZFS on a desktop then a simple JBOD is a box
to look at. If you''re going to look at an enterprise data center the 
answer is going to be different. That is something a lot of people on 
this alias seem to be missing out on. Stating ZFS on JBODs is the answer 
to everything is the punchline of the "When all you have is a
hammer..."
routine.

James C. McPherson

2006-Sep-13 05:14 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Richard Elling wrote:> Frank Cusack wrote:
>> It would be interesting to have a zfs enabled HBA to offload the
checksum
>> and parity calculations.  How much of zfs would such an HBA have to
>> understand?
> [warning: chum]
> Disagree.  HBAs are pretty wimpy.  It is much less expensive and more
> efficient to move that (flexible!) function into the main CPUs.
I think Richard is in the groove here. All the hba chip
implementation documentation that I''ve seen (publicly
available of course) indicates that these chips are
already highly optimized engines, and I don''t think that
adding extra functionality like checksum and parity
calculations would be an efficient use of silicon/SoI.

cheers,
James

Richard Elling

2006-Sep-13 06:45 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

Frank Cusack wrote:> It would be interesting to have a zfs enabled HBA to offload the checksum
> and parity calculations.  How much of zfs would such an HBA have to
> understand?[warning: chum]
Disagree.  HBAs are pretty wimpy.  It is much less expensive and more
efficient to move that (flexible!) function into the main CPUs.

 -- richard

Erik Trimble

2006-Sep-13 09:16 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

James C. McPherson wrote:> Richard Elling wrote:
>> Frank Cusack wrote:
>>> It would be interesting to have a zfs enabled HBA to offload the 
>>> checksum
>>> and parity calculations.  How much of zfs would such an HBA have to
>>> understand?
>> [warning: chum]
>> Disagree.  HBAs are pretty wimpy.  It is much less expensive and more
>> efficient to move that (flexible!) function into the main CPUs.
>
> I think Richard is in the groove here. All the hba chip
> implementation documentation that I''ve seen (publicly
> available of course) indicates that these chips are
> already highly optimized engines, and I don''t think that
> adding extra functionality like checksum and parity
> calculations would be an efficient use of silicon/SoI.
>
> cheers,
> James
HBAs work on an entirely different layer than what checksumming data 
would be efficient at.

If we''re using the OSI-style model for this type, HBAs work at layer 
1.   And,  as James mentioned, they are highly specialized ASICs for 
doing just  bus-level communications. It''s not like there is extra 
general-purposes compute power available (or, even can possibly be 
built-in).  Checksumming for ZFS requires filesystem-level knowledge, 
which is effectively up at OSI layer 6 or 7, and well beyond the 
understanding of a lowly HBA (it''s just passing bits back and forth,
and
has no conception of what they mean).

Essentially, moving block checksumming into the HBA would at best be 
similar to what we see with super-low-cost RAID controllers and the XOR 
function.  Remember how well that works? 

Now, building ZFS-style checksum capability (or, just hardware checksum 
capability for ZFS to call) is indeed proper and possible for _real_ 
hardware RAID controllers, as they are much more akin to standard 
general-purpose CPUs (indeed, most now use a GP processor anyway). 

We''re back into the old argument of "put it on a co-processor,
then move
it onto the CPU, then move it back onto a co-processor" cycle.   
Personally, with modern CPUs being so under-utilized these days, and all 
ZFS-bound data having to move through main memory in any case (whether 
hardware checksum-assisted or not), use the CPU.  Hardware-assist for 
checksum sounds nice, but I can''t think of it actually being more 
efficient that doing it on the CPU (it won''t actually help
performance),
so why bother with extra hardware?

-Erik

Casper.Dik at Sun.COM

2006-Sep-13 09:25 UTC

head link

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

>We''re back into the old argument of "put it on a co-processor,
then move
>it onto the CPU, then move it back onto a co-processor" cycle.   
>Personally, with modern CPUs being so under-utilized these days, and all 
>ZFS-bound data having to move through main memory in any case (whether 
>hardware checksum-assisted or not), use the CPU.  Hardware-assist for 
>checksum sounds nice, but I can''t think of it actually being more 
>efficient that doing it on the CPU (it won''t actually help
performance),
>so why bother with extra hardware?
Plus it moves part of the resiliency away from where we knew the data
was good (the CPU/computer) across a bus/fabric/whatnot possibly
causing checksums to be computed over incorrect data.

We already see that with IP checksuming off-loading and broken hardware
and broken VLAN switches recomputing the ethernet CRC.

Casper

Anton B. Rang

2006-Sep-13 15:57 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

> It would be interesting to have a zfs enabled HBA to offload the checksum
> and parity calculations. How much of zfs would such an HBA have to
> understand?
That''s an interesting question.

For parity, it''s actually pretty easy.  One can envision an HBA which
took a
group of related write commands and computed the parity on the fly, using
it for a final write command.  This would, however, probably limit the size
of a block that could be written to whatever amount of memory was available
for buffering on the HBA.  (Of course, memory is relatively cheap these days,
but it''s still not free, so the HBA might have only a few megabytes.)

The checksum is more difficult.  If you''re willing to delay writing an
indirect
block until all of its children have been written [*], then we can just compute
the checksum for each block as it goes out, and that''s easy [**] --
easier than the
parity, in fact, since there''s no buffering required beyond the
checksum itself.
ZFS in fact does delay this write at present.  However, I''ve argued in
the past
that ZFS shouldn''t delay it, but should write indirect blocks in
parallel with the
data blocks.  It would be interesting to determine whether the performance
improvement of doing checksums on the HBA would outweigh the potential
benefit of writing indirect blocks in parallel.  Maybe it would for larger
writes.

Anyone got an FPGA programmer and an open-source SATA implementation?  :-)
(Unfortunately storage protocols have a complex analog side, and except for
1394, I''m not aware of any implementations that separate the
digital/analog,
which makes prototyping a lot harder, at least without much more detailed
documentation on the controllers than you''re likely to find.)

-- Anton

[*] Actually, you don''t need to delay until the writes have made it to
disk, but
since you want to compute the checksum as the data goes out to the disk rather
than making a second pass over it, you''d need to wait until the data
has at least
been sent to the drive cache.

[**] For SCSI and FC, there''s added complexity in that the drives can
request
data out-of-order. You can disable this but at the cost of some performance
on high-end drives.
 
 
This message posted from opensolaris.org

Anton B. Rang

2006-Sep-13 16:21 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

> just measured quickly that a 1.2Ghz sparc can do [400-500]MB/sec
> of encoding (time spent in misnamed function
> vdev_raidz_reconstruct) for a 3 disk raid-z group.
Strange, that seems very low.

Ah, I see. The current code loops through each buffer, either copying or XORing
it into the parity. This likely would perform quite a bit better if it were
reworked to go through more than one buffer at a time, doing the XOR. (Reading
the partial parity is expensive.) Actually, this would be an instance where
using assembly language or even processor-dependent code would be useful. Since
the prefetch buffers on UltraSPARC are only applicable to floating-point loads,
we should probably use prefetch & the VIS xor instructions. (Even calling
bcopy instead of using the existing copy loop would help.)

FWIW, on large systems we ought to be aiming to sustain 8 GB/s or so of writes,
and using 16 CPUs for just parity computation seems inordinately painful. :-)
 
 
This message posted from opensolaris.org

Anton B. Rang

2006-Sep-13 16:25 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

>With ZFS however the in-between cache is obsolete, as individual disk caches
can be used >directly. I also openly question whether even the dedicated RAID
HW is faster than the newest >CPUs in modern servers.
Individual disk caches are typically in the 8-16 MB range; for 15 disks, that
gives you about 256 MB. A RAID with 15 drives behind it might have 2-4 GB of
cache. That''s a big improvement.

The dedicated RAID hardware may not be faster than the newest CPUs, but as a
friend of mine has pointed out, even though delegating a job to somebody else
often means it''s done more slowly, it frees him up to do his other
work. (It''s also pondering the difference between latency and
bandwidth. When parity is computed inline with the data path, as is often the
case for hardware controllers, the bandwidth is relatively low since
it''s happening at the speed of data transfer to an individual disk, but
the latency is effectively zero, since it''s not adding any time to the
transfer.)
 
 
This message posted from opensolaris.org

Roch - PAE

2006-Sep-14 09:53 UTC

head link

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

"With ZFS however the in-between cache is obsolete, as individual disk
  caches can be used directly."


The statement needs to be qualified.

Storage cache, if protected, works  great to reduce critical
op latency.  ZFS when it   writes to disk cache, will  flush
data   out  before return  to   say  an  O_DSYNC write.  The
application level latency is not improved  by the disk write
cache.  But a battery protected mirrored storage cache should
act as a latency reductor, thus improving some workloads.


____________________________________________________________________________________
	Performance, Availability & Architecture Engineering  

Roch Bourbonnais                        Sun Microsystems, Icnc-Grenoble 
Senior Performance Analyst              180, Avenue De L''Europe, 38330,
					Montbonnot Saint Martin, France
Roch.Bourbonnais at Sun.Com        	http://blogs.sun.com/roller/page/roch

zfs discuss - Sep 2006 - Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320