thr3ads.net - zfs discuss - [zfs-discuss] ZFS & se6920 [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Wee Yeh Tan

2006-Aug-16 10:55 UTC

[zfs-discuss] ZFS & se6920

Hi all,

My company will be acquiring the Sun SE6920 for our storage
virtualization project and we intend to use quite a bit of ZFS as
well.  The 2 technologies seems somewhat at odds since the 6920 means
layers of hardware abstraction but ZFS seems to prefer more direct
access to disk.

I tried to search around but couldn''t find any performance numbers of
ZFS on SE6920 nor any recommendations where to start or what the
considerations could be.  Will appreciate any hints in this area.

Thanks.

-- 
Just me,
Wire ...

James C. McPherson

2006-Aug-16 12:32 UTC

head link

[zfs-discuss] ZFS & se6920

Wee Yeh Tan wrote:> My company will be acquiring the Sun SE6920 for our storage
> virtualization project and we intend to use quite a bit of ZFS as
> well.  The 2 technologies seems somewhat at odds since the 6920 means
> layers of hardware abstraction but ZFS seems to prefer more direct
> access to disk.
Yes, and then again, no. What you have with the SE6920 is a rack
which provides you with hardware redundancy and whopping great
cache.

For my money, as long as you configure multiple paths from your
attached hosts and ensure that each lun the SE6920 presents has
at least two paths to your ZFS host, then you should be just fine.
> I tried to search around but couldn''t find any performance numbers
of
> ZFS on SE6920 nor any recommendations where to start or what the
> considerations could be.  Will appreciate any hints in this area.
Ah, well that really depends on what your target usage mode is,
and what you really want to achieve.

With a hw raid array you''ve got a lot of knobs to tweak (so to
speak), and you can also decide between mirrors and raidz/raidz2
with zfs.

I suggest that you contact Roch Bourbonnais or Richard Elling
since this is really their area. (They''re both @Sun.COM btw).


best regards,
James C. McPherson

Jerome Haynes-Smith

2006-Aug-16 12:58 UTC

head link

[zfs-discuss] ZFS & se6920

On Wednesday 16 August 2006 11:55, Wee Yeh Tan wrote:> Hi all,
>
> My company will be acquiring the Sun SE6920 for our storage
> virtualization project and we intend to use quite a bit of ZFS as
> well.  The 2 technologies seems somewhat at odds since the 6920 means
> layers of hardware abstraction but ZFS seems to prefer more direct
> access to disk.
I suggest reading the Best Practices for the Sun StorEdge? 6920 System - 
819-0122-10 (v3.0) 

http://docs.sun.com/app/docs?q=819-0122-10

Regards
Jerome>
> I tried to search around but couldn''t find any performance numbers
of
> ZFS on SE6920 nor any recommendations where to start or what the
> considerations could be.  Will appreciate any hints in this area.
>
> Thanks.
-- 
Jerome Haynes-Smith
Sun PTS Storage EMEA

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain
confidential and privileged information. Any unauthorized review, use,
disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and
destroy all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Torrey McMahon

2006-Aug-16 22:24 UTC

head link

[zfs-discuss] ZFS & se6920

Wee Yeh Tan wrote:> Hi all,
>
> My company will be acquiring the Sun SE6920 for our storage
> virtualization project and we intend to use quite a bit of ZFS as
> well.  The 2 technologies seems somewhat at odds since the 6920 means
> layers of hardware abstraction but ZFS seems to prefer more direct
> access to disk. 
Not at odds. I would say overlapping in some areas and, in some cases 
complimentary, but not at odds unless mis-configured.

First, ZFS prefers access to LUNs. In many cases it prefers access to 
lots of LUNs for file system integrity as well as performance. The 
composition of those LUNs can be just about anything as long as ZFS can 
read and write blocks in a sane manner. Sure, the simpler the config the 
better for overall manageability but in many cases storage 
configurations increase in complexity as they increase in functionality.

Second, as you noted above the 6920 lets you, among other things, 
virtualize the underlying storage. It lets you do this for multiple 
hosts and OS at the block level. Today, ZFS is a Solaris 10 single host 
filesystem. Placing ZFS on the LUNs exported from a 6920 to your Solaris 
10 hosts is one option you should definitely look into. Unfortunately, 
you might not be able to use it for all of the hosts you intend to hook 
up to the 6920.

Third you need to look at the overall system configuration and 
architecture when making storage decisions. Does it make sense to use 
RaidZ on the host on top of a Raid1 volume from the 6920 that sits on a 
Raid5 volume in the T3B you have in a rack across the room? Probably 
not. (In fact it never does but I digress....) Does it make sense to use 
a base RAID level within the 6920 that exports LUNs to all of your data 
center? Perhaps with the assumption that you''ll then you then use a 
simple ZFS configuration on top of it for the hosts that run ZFS. Again, 
it depends on the config and what your trying to accomplish overall as 
well as the applications and host specifics.

Roch

2006-Aug-17 09:08 UTC

head link

[zfs-discuss] ZFS & se6920

WYT said:

  Hi all,

  My company will be acquiring the Sun SE6920 for our storage
  virtualization project and we intend to use quite a bit of ZFS as
  well.  The 2 technologies seems somewhat at odds since the 6920 means
  layers of hardware abstraction but ZFS seems to prefer more direct
  access to disk.

  I tried to search around but couldn''t find any performance numbers of
  ZFS on SE6920 nor any recommendations where to start or what the
  considerations could be.  Will appreciate any hints in this area.

  Thanks.

  -- 
  Just me,
  Wire ...
  _______________________________________________
  zfs-discuss mailing list
  zfs-discuss at opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


My general principles are:

	If you can, to improve you ''Availability'' metrics, 
	let ZFS handle one level of redundancy;

	For Random Read performance prefer mirrors over
	raid-z. If you use raid-z, group together a smallish
	number of volumes.

	setup volumes that correspond to small number of
	drives (smallest   you   can bear) with  a  volume
	interlace that is in the [1M-4M] range.

And next, a very very important thing that we will have to
pursue with Storage Manufacturers including ourself:

	In cases where the storage cache is to be considered
	"stable storage" in the face of power failure, we
	have to be able to configure the storage to ignore
	the "flush write cache" commands that ZFS issues.

	Some  Storage  do ignore the flush  out  of the box,
	others don''t.  It   should  be easy to  verify   the
	latency of a small O_DSYNC write. On a quiet system,
	I expect sub  millisec  response.  5ms to  a battery
	protected cache should be red-flagged.

	This was just filed to track the issue:
	6460889 zil shouldn''t send write-cache-flush command to <some>
devices

Note also that S10U2 has already been greatly improved
performance wise, tracking releases is very important.

-r

____________________________________________________________________________________
	Performance, Availability & Architecture Engineering  

Roch Bourbonnais                        Sun Microsystems, Icnc-Grenoble 
Senior Performance Analyst              180, Avenue De L''Europe, 38330,
					Montbonnot Saint Martin, France
http://icncweb.france/~rbourbon		http://blogs.sun.com/roller/page/roch
Roch.Bourbonnais at Sun.Com		(+33).4.76.18.83.20

Robert Milkowski

2006-Aug-17 12:28 UTC

head link

[zfs-discuss] ZFS & se6920

Hello Roch,

Thursday, August 17, 2006, 11:08:37 AM, you wrote:
R> My general principles are:

R>         If you can, to improve you ''Availability''
metrics,
R>         let ZFS handle one level of redundancy;

R>         For Random Read performance prefer mirrors over
R>         raid-z. If you use raid-z, group together a smallish
R>         number of volumes.

R>         setup volumes that correspond to small number of
R>         drives (smallest   you   can bear) with  a  volume
R>         interlace that is in the [1M-4M] range.

Why that big interlace? With lot of small reads it could actually
introduce large overhead, right? I can understand something like
960KB, but 4M?






-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Roch

2006-Aug-17 13:00 UTC

head link

[zfs-discuss] ZFS & se6920

Robert Milkowski writes:
 > Hello Roch,
 > 
 > Thursday, August 17, 2006, 11:08:37 AM, you wrote:
 > R> My general principles are:
 > 
 > R>         If you can, to improve you ''Availability''
metrics,
 > R>         let ZFS handle one level of redundancy;
 > 
 > R>         For Random Read performance prefer mirrors over
 > R>         raid-z. If you use raid-z, group together a smallish
 > R>         number of volumes.
 > 
 > R>         setup volumes that correspond to small number of
 > R>         drives (smallest   you   can bear) with  a  volume
 > R>         interlace that is in the [1M-4M] range.
 > 
 > Why that big interlace? With lot of small reads it could actually
 > introduce large overhead, right? I can understand something like
 > 960KB, but 4M?
 > 

I also think we should be fine with 1M.

Not sure what overhead we''re talking here.
Did you mean large skew ? During a pool synch, at least, one 
of interest, we expect to have lots of data to synch, even
if it''s just a 1GB, 4M interlace still spreads to 256 disks.

-r

Wee Yeh Tan

2006-Aug-26 16:43 UTC

head link

[zfs-discuss] ZFS & se6920

Thanks to all who have responded.  I spent 2 weekends working through
the best practices tthat Jerome recommended -- it''s quite a mouthful.

On 8/17/06, Roch <Roch.Bourbonnais at sun.com>
wrote:> My general principles are:
>
>         If you can, to improve you ''Availability''
metrics,
>         let ZFS handle one level of redundancy;
Cool.  This is a good way to take advantage of the
error-detection/correcting feature in ZFS.  We will definitely take
this suggestion!
>         For Random Read performance prefer mirrors over
>         raid-z. If you use raid-z, group together a smallish
>         number of volumes.
>         setup volumes that correspond to small number of
>         drives (smallest   you   can bear) with  a  volume
>         interlace that is in the [1M-4M] range.
I have a hard time picturing this wrt the 6920 storage pool.  The
internal disks in the 6920 presents up to 2 VD per array (6-7 disk
each?).  The storage pool will be built from a bunch of these VD and
may be futher partitioned into several volumes and each volume is
presented to a ZFS host.  What should the storage profile look like?
I can probably do a stripe profile since I can leave the redundancy to
ZFS.

To complicate matters, we are likely going to attach all our 3510 into
the 6920 and use some of these for the ZFS volumes so futher
restrictions may apply.  Are we better off doing a direct attach?
> And next, a very very important thing that we will have to
> pursue with Storage Manufacturers including ourself:
>
>         In cases where the storage cache is to be considered
>         "stable storage" in the face of power failure, we
>         have to be able to configure the storage to ignore
>         the "flush write cache" commands that ZFS issues.
>
>         Some  Storage  do ignore the flush  out  of the box,
>         others don''t.  It   should  be easy to  verify   the
>         latency of a small O_DSYNC write. On a quiet system,
>         I expect sub  millisec  response.  5ms to  a battery
>         protected cache should be red-flagged.
>
>         This was just filed to track the issue:
>         6460889 zil shouldn''t send write-cache-flush command to
<some> devices
Noted.
> Note also that S10U2 has already been greatly improved
> performance wise, tracking releases is very important.
>
> -r

-- 
Just me,
Wire ...

Robert Milkowski

2006-Aug-28 08:19 UTC

head link

[zfs-discuss] ZFS & se6920

Hello Wee,

Saturday, August 26, 2006, 6:43:05 PM, you wrote:

WYT> Thanks to all who have responded.  I spent 2 weekends working through
WYT> the best practices tthat Jerome recommended -- it''s quite a
mouthful.

WYT> On 8/17/06, Roch <Roch.Bourbonnais at sun.com>
wrote:>> My general principles are:
>>
>>         If you can, to improve you ''Availability''
metrics,
>>         let ZFS handle one level of redundancy;
WYT> Cool.  This is a good way to take advantage of the
WYT> error-detection/correcting feature in ZFS.  We will definitely take
WYT> this suggestion!
>>         For Random Read performance prefer mirrors over
>>         raid-z. If you use raid-z, group together a smallish
>>         number of volumes.
>>         setup volumes that correspond to small number of
>>         drives (smallest   you   can bear) with  a  volume
>>         interlace that is in the [1M-4M] range.
WYT> I have a hard time picturing this wrt the 6920 storage pool.  The
WYT> internal disks in the 6920 presents up to 2 VD per array (6-7 disk
WYT> each?).  The storage pool will be built from a bunch of these VD and
WYT> may be futher partitioned into several volumes and each volume is
WYT> presented to a ZFS host.  What should the storage profile look like?
WYT> I can probably do a stripe profile since I can leave the redundancy to
WYT> ZFS.

IMHO if you have VD make just one partition and present it as a LUN to
ZFS. Do not present severap partitions from the same disks to ZFS as
different LUN.

WYT> To complicate matters, we are likely going to attach all our 3510 into
WYT> the 6920 and use some of these for the ZFS volumes so futher
WYT> restrictions may apply.  Are we better off doing a direct attach?

You can attach 3510 JBODs (I guess) directly - but currently there''re
restrictions - only one host and no MPxIO. If it''s ok it looks like
you''ll get better performance than if going with 3510 head unit.

ps. I did try with MPxIO and two hosts connected, with several JBODs -
and I did see FC loop logoug/login, etc.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Wee Yeh Tan

2006-Aug-29 02:31 UTC

head link

[zfs-discuss] ZFS & se6920

On 8/28/06, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:> Saturday, August 26, 2006, 6:43:05 PM, you wrote:
> WYT> Thanks to all who have responded.  I spent 2 weekends working
through
> WYT> the best practices tthat Jerome recommended -- it''s quite
a mouthful.
>
> WYT> On 8/17/06, Roch <Roch.Bourbonnais at sun.com> wrote:
> >> My general principles are:
> >>
> >>         If you can, to improve you
''Availability'' metrics,
> >>         let ZFS handle one level of redundancy;
>
> WYT> Cool.  This is a good way to take advantage of the
> WYT> error-detection/correcting feature in ZFS.  We will definitely take
> WYT> this suggestion!
>
> >>         For Random Read performance prefer mirrors over
> >>         raid-z. If you use raid-z, group together a smallish
> >>         number of volumes.
>
> >>         setup volumes that correspond to small number of
> >>         drives (smallest   you   can bear) with  a  volume
> >>         interlace that is in the [1M-4M] range.
>
> WYT> I have a hard time picturing this wrt the 6920 storage pool.  The
> WYT> internal disks in the 6920 presents up to 2 VD per array (6-7 disk
> WYT> each?).  The storage pool will be built from a bunch of these VD
and
> WYT> may be futher partitioned into several volumes and each volume is
> WYT> presented to a ZFS host.  What should the storage profile look
like?
> WYT> I can probably do a stripe profile since I can leave the redundancy
to
> WYT> ZFS.
>
> IMHO if you have VD make just one partition and present it as a LUN to
> ZFS. Do not present severap partitions from the same disks to ZFS as
> different LUN.
I''m a real newbie here as you can probably tell and this is one of the
aspect I''m struggling with.  The compromise seems to be between
putting in more spindles without increasing the volume stripe size.
ZFS on simple disks manages itself nicely.

When constructing the VD from the 3510, we will likely stripe across 2-3 disks.
For virtualisation strategy on the 6920, we will probably go with
concat.  I do not imagine that striping here will go well with ZFS.  I
just have to be careful not to present volumes from the same VDs.
Another alternative will be to just present the VDs directly without
virtualization.

Cost is the primary concern for this project.
> WYT> To complicate matters, we are likely going to attach all our 3510
into
> WYT> the 6920 and use some of these for the ZFS volumes so futher
> WYT> restrictions may apply.  Are we better off doing a direct attach?
>
> You can attach 3510 JBODs (I guess) directly - but currently
there''re
> restrictions - only one host and no MPxIO. If it''s ok it looks
like
> you''ll get better performance than if going with 3510 head unit.
>
> ps. I did try with MPxIO and two hosts connected, with several JBODs -
> and I did see FC loop logoug/login, etc.
I saw your benchmarking results.  Great work there.


-- 
Just me,
Wire ...

zfs discuss - Aug 2006 - ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920

[zfs-discuss] ZFS & se6920