thr3ads.net - zfs discuss - [zfs-discuss] ZFS/UFS layout for 4 disk servers [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Matt B

2007-Mar-06 23:39 UTC

[zfs-discuss] ZFS/UFS layout for 4 disk servers

I am trying to determine the best way to move forward with about 35 x86
X4200''s
Each box has 4x 73GB internal drives.

All the boxes will be built using Solaris 10 11/06. Additionally, these boxes
are part of a highly available production environment with an uptime expectation
of 6 9''s ( just a few seconds per month unscheduled downtime allowed)

Ideally, I would like to use a single RaidZ2 pool of all 4 disks, but apparently
that is not supported yet. I understand there is the ZFSmount software for
making a ZFS root, but I don''t think I want to use that for an
environment of this grade and I can''t wait until Sun comes out with it
integrated later this year...have to use 11/06

For perspective, these systems are currently running using pure UFS. With only 2
of the 4 disks being used in a software raid 1
/ = 5GB
/var = 5GB
/tmp = 4GB
/home = 2GB
/data = 50GB

I am looking for recommendations on how to maximize the use of ZFS and minimize
the use of UFS without resorting to anything "experimental".

So assuming that each 73GB disk yields 70GB usable space...
Would it make sense to create a UFS root partition of 5GB that is a 4 way mirror
across all 4 disks? I haven''t used SVM to create these types of mirrors
before so if anyone has any experience here let me know. My expectation is that
up to any 3 of the 4 disks could fail while leaving the root partition intact.
Basically, every time root has data updated that data would be written 3 times
more to each other disk

So this would leave each disk with 68GB of free space. I would then create a 4GB
UFS /tmp (swap) partition that would be 4 way mirrored across the remaining 3
disks just as I am suggesting above with the root partition. So again, up to any
3 disks could fail and the swap filesystem would still be intact.

This would leave each disk with 64GB of free space, totaling 256GB. I would then
create a single ZFS pool of all the remaining freespace on each of the 4 disks.

How should this be done? 

Perhaps a form of mirring? What would be the difference in doing?
zpool create tank mirror c1d0 c2d0 c3d0 c4d0
or
zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0

Would it be better to use RaidZ with a hotspare or RAIDZ2

I would like /data, /home, and /var to be able to grow as needed and be able to
withstand at least 2 disk failures (doesn''t have to be any 2). I am
open to using a hotspare

Suggestions?
 
 
This message posted from opensolaris.org

Tomas Ögren

2007-Mar-06 23:54 UTC

head link

[zfs-discuss] ZFS/UFS layout for 4 disk servers

On 06 March, 2007 - Matt B sent me these 2,5K bytes:
> I am trying to determine the best way to move forward with about 35 x86
X4200''s
> Each box has 4x 73GB internal drives.
> 
> This would leave each disk with 64GB of free space, totaling 256GB. I
> would then create a single ZFS pool of all the remaining freespace on
> each of the 4 disks. 
> 
> How should this be done? 
> 
> Perhaps a form of mirring? What would be the difference in doing?
> zpool create tank mirror c1d0 c2d0 c3d0 c4d0
64GB usable space, any 3 disks can die.
> or
> zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0
128GB usable space, 1-2 disks can die.
> Would it be better to use RaidZ with a hotspare or RAIDZ2
Raidz + hotspare: 128GB usable space, 1 disk can die .. <pause and hope
that nothing bad happens, wait for resilver> .. 1 more disk can die..

raidz2: 128GB usable space, any 2 disks can die at the same time.
> I would like /data, /home, and /var to be able to grow as needed and
> be able to withstand at least 2 disk failures (doesn''t have to be
any
> 2). I am open to using a hotspare

4 way mirror has highest read performance, then 2+2, then probably raidz
and finally raidz2. It all depends on your tradeoff of security vs space
vs performance.


http://blogs.sun.com/roch/entry/when_to_and_not_to
http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Richard Elling

2007-Mar-07 02:39 UTC

head link

[zfs-discuss] ZFS/UFS layout for 4 disk servers

Good timing, I''d like some feedback for some work I''m doing
below...

Matt B wrote:> I am trying to determine the best way to move forward with about 35 x86
X4200''s
> Each box has 4x 73GB internal drives.
Cool. Nice box.
> All the boxes will be built using Solaris 10 11/06. Additionally, these
boxes
> are part of a highly available production environment with an uptime
expectation
> of 6 9''s ( just a few seconds per month unscheduled downtime
allowed)
Just for my curiosity, how do you measure 6 9''s?
> Ideally, I would like to use a single RaidZ2 pool of all 4 disks, but
apparently
> that is not supported yet. I understand there is the ZFSmount software for
making
> a ZFS root, but I don''t think I want to use that for an
environment of this grade
> and I can''t wait until Sun comes out with it integrated later this
year...have to
> use 11/06
It is supported to use 4 disks in a pool, but it isn''t yet supported to
use
ZFS for the root file system.  So, you''ll end up mixing UFS and ZFS on
the same
disk, as a likely option.
> For perspective, these systems are currently running using pure UFS. With
only 2
> of the 4 disks being used in a software raid 1
> / = 5GB
> /var = 5GB
> /tmp = 4GB
> /home = 2GB
> /data = 50GB
I''m not a fan of separate /var, it just complicates things. 
I''ll also presume
that by "/tmp" you really mean "swap"
> I am looking for recommendations on how to maximize the use of ZFS and
minimize
> the use of UFS without resorting to anything "experimental".
Put / in UFS, swap as raw, and everything else in a zpool.
I would mirror /+swap on two disks, with the other two disks used as a
LiveUpgrade
alternate boot environment.  When you patch or upgrade, you will normally have
better availability (shorter planned outages) with LiveUpgrade.  Also,
you''ll be
able to roll back to the previous boot environment, potentially saving more
time.
> So assuming that each 73GB disk yields 70GB usable space...
> Would it make sense to create a UFS root partition of 5GB that is a 4 way
mirror
> across all 4 disks? I haven''t used SVM to create these types of
mirrors before so
> if anyone has any experience here let me know. My expectation is that up to
any 3
> of the 4 disks could fail while leaving the root partition intact.
Basically,
> every time root has data updated that data would be written 3 times more to
each
> other disk
I don''t see any practical gain for a 4-way mirror over a 3-way mirror. 
With such
configs you are much more likely to see some other fault which will ruin your
day (eg. accidental rm)
> So this would leave each disk with 68GB of free space. I would then create
a
> 4GB UFS /tmp (swap) partition that would be 4 way mirrored across the
remaining
> 3 disks just as I am suggesting above with the root partition. So again, up
to
> any 3 disks could fail and the swap filesystem would still be intact.
> 
> This would leave each disk with 64GB of free space, totaling 256GB. I would
then
> create a single ZFS pool of all the remaining freespace on each of the 4
disks.
> 
> How should this be done? 
> 
> Perhaps a form of mirring? What would be the difference in doing?
> zpool create tank mirror c1d0 c2d0 c3d0 c4d0
> or
> zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0
> 
> Would it be better to use RaidZ with a hotspare or RAIDZ2
> 
> I would like /data, /home, and /var to be able to grow as needed and be
able to
> withstand at least 2 disk failures (doesn''t have to be any 2). I
am open to using
> a hotspare
> 
> Suggestions?
Prioritize your requirements.  Then take a look at the attached spreadsheet.
What the spreadsheet contains is a report from RAIDoptimizer for the type of
disk you''ll be likely to have, based upon the disk vendor''s
data sheet (Seagate
Saviio).  The algorithms are described in my blog, http://blogs.sun.com/relling
and an enterprising person could key them into a spreadsheet.

There are 4 main portions of the data:
	+ configuration info: raid type, set size, spares, available space
	+ mean time to data loss (MTTDL) info: for two different MTTDL models
	+ performance info: random, read iops and media bandwidths
	+ mean time between services (MTBS) info: how often do you expect to repair
	  something

I''m particularly interested in feedback on MTBS.  The various MTBS
models consider
the immediate effect of having a bunch of disks, and the deferred repair
strategies
of waiting until you have to replace a disk, based upon the RAID config and
spares.
In any case, a higher MTBS is better, though there is more risk for each MTBS
model.
Let me know if this is helpful.

As Tomas said, you could look at some of this data in graphical form on my blog,
though those graphs assume 46 disks instead of 4.  For 4 disks, you have far
fewer
possible combinations, so it fits reasonably in a spreadsheet.

Caveat: the numbers are computed by algorithms and the code has not yet been
verified that it properly implements the algorithms.  Models are simplifications
of real life, don''t expect real life to follow a model.  If you do
follow models,
then note that Elizabeth Hurley is off the market :-)
  -- richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: for_matt.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 10974 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070306/7379ed0b/attachment.ods>

Matt B

2007-Mar-07 16:50 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Thanks for responses. There is a lot there I am looking forward to digesting.
Right off the bat though I wanted to bring up something I found just before
reading this reply as the answer to this question would automatically answer
some other questinos

There is a ZFS best practices wiki at
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations

that makes a couple points:

*Swap space - Because ZFS caches data in kernel addressable memory, the kernel
sizes will likely be larger than with other file systems. Configure additional
disk-based swap to account for this difference. You can use the size of physical
memory as an upper bound to the extra amount of swap space that might be
required. Do not use slices on the same disk for both swap space and ZFS file
systems. Keep the swap areas separate from the ZFS file systems.

*Do not use slices for storage pools that are intended for production use.

So after reading this it seems that with only 4 disks to work with and the fact
that UFS for root (initial install) is still required that my only option to
conform to the best practice is to use two disks with UFS/RAid1 leaving only the
remaining two disks for 100% ZFS. Additionally, the swap partition would have to
go on the UFS set of disks to keep it seperate from the ZFS set of disks

If I am misinterpreting the wiki please let me know.

There are some tradeoffs here. I would prefer to use a 4 way mirrored slice of a
UFS root and a 4 way mirrored slice of UFS swap, and then leave equal slices
free for ZFS, but then it sounds like I would have to risk not following the
best practice and have to mess with SVM. The nice thing is I could be looking at
128GB yield with a decent level fault tolerance.
With Zpooling could I take each of the two 64GB slices and place them into two
zfs stripes (raid0) and then join those two into a zfs mirror? Seems like then I
could get a 128GB yield without having to use the RaidZ/Hot or Raidz2 which
according the links I just skimmed performs/lasts below mirroring

The other option I described above, I could just slap the first two disks into a
HW raid 1 as the x4200''s support 2 disk raid1''s and then slap
the remaining 2 disks into ZFS and this (I think) would not be violating the
best practice?

Any thoughts on the best practice points I am raising? It disturbs me that it
would make a statement like "don''t use slices for
production".

This message posted from opensolaris.org

Frank Cusack

2007-Mar-07 17:22 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

On March 7, 2007 8:50:53 AM -0800 Matt B <mattbreedlove at yahoo.com>
wrote:> Any thoughts on the best practice points I am raising? It disturbs me
> that it would make a statement like "don''t use slices for
production".
I think that''s just a performance thing.

-frank

Richard Elling

2007-Mar-07 17:51 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Frank Cusack wrote:> On March 7, 2007 8:50:53 AM -0800 Matt B <mattbreedlove at yahoo.com>
wrote:
>> Any thoughts on the best practice points I am raising? It disturbs me
>> that it would make a statement like "don''t use slices for
production".
> 
> I think that''s just a performance thing.
yep, for those systems with lots of disks.
  -- richard

Richard Elling

2007-Mar-07 18:05 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Matt B wrote:> Thanks for responses. There is a lot there I am looking forward to
digesting.
> Right off the bat though I wanted to bring up something I found just before
> reading this reply as the answer to this question would automatically
answer
> some other questinos
> 
> There is a ZFS best practices wiki at  >
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations> 
> that makes a couple points:
> 
> *Swap space - Because ZFS caches data in kernel addressable memory, the
kernel
> sizes will likely be larger than with other file systems. Configure
additional
> disk-based swap to account for this difference. You can use the size of
physical
> memory as an upper bound to the extra amount of swap space that might be
required.
> Do not use slices on the same disk for both swap space and ZFS file
systems.
> Keep the swap areas separate from the ZFS file systems.
This recommendation is only suitable for low memory systems with lots of disks.
Clearly, it would be impractical for a system with a single disk.
> *Do not use slices for storage pools that are intended for production use.
> 
> So after reading this it seems that with only 4 disks to work with and the
fact
> that UFS for root (initial install) is still required that my only option
to conform
> to the best practice is to use two disks with UFS/RAid1 leaving only the
remaining
> two disks for 100% ZFS. Additionally, the swap partition would have to go
on the
> UFS set of disks to keep it seperate from the ZFS set of disks
> 
> If I am misinterpreting the wiki please let me know.
The best thing about best practices is that there are so many of them :-/
I''ll see if I can clarify in the wiki.
> There are some tradeoffs here. I would prefer to use a 4 way mirrored slice
of a
> UFS root and a 4 way mirrored slice of UFS swap, and then leave equal
slices free
> for ZFS, but then it sounds like I would have to risk not following the
best
> practice and have to mess with SVM. The nice thing is I could be looking at
128GB
> yield with a decent level fault tolerance.
> With Zpooling could I take each of the two 64GB slices and place them into
two zfs
> stripes (raid0) and then join those two into a zfs mirror? Seems like then
I could
> get a 128GB yield without having to use the RaidZ/Hot or Raidz2 which
according
> the links I just skimmed performs/lasts below mirroring
Be careful, with ZFS you don''t take a stripe and mirror it (RAID-0+1),
you take a
mirror and stripe it (RAID-1+0).  For example, you would do:
	zpool create mycoolpool mirror c_d_t_s_ c_d_t_s_ mirror c_d_t_s_ c_d_t_s_
> The other option I described above, I could just slap the first two disks
into a
> HW raid 1 as the x4200''s support 2 disk raid1''s and then
slap the remaining 2 disks
> into ZFS and this (I think) would not be violating the best practice?
Yes, this would work fine.  It would simplify your boot and OS install/upgrade.
I''d still recommend planning on using LiveUpgrade -- leave a spare
slice for an
alternate boot environment.
> Any thoughts on the best practice points I am raising? It disturbs me that
it would
> make a statement like "don''t use slices for production".
Sometimes it is not what you say, it is how you say it.
  -- richard

Matt B

2007-Mar-07 18:31 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

So it sounds like the consensus is that I should not worry about using slices
with ZFS
and the swap best practice doesn''t really apply to my situation of a 4
disk x4200.

So in summary(please confirm) this is what we are saying is a safe bet for using
in a highly available production environment?

With 4x73 gig disks yielding 70GB each:

5GB for root which is UFS and mirrored 4 ways using SVM.
8GB for swap which is raw and mirrored across first two disks (optional: or no
liveupgrade and 4 way mirror this swap partition)
8GB for LiveUpgrade which is mirrored across the third and fourth two disks
This leaves 57GB of free space on each of the 4 disks in slices
One zfs pool will be created containing the 4 slices
the first two slices will be used in a zmirror yielding 57GB
The last two slices will be used in a zmirror yielding 57GB
Then a zstripe (raid0) will be layed over the two zmirrors yielding 114GB usable
space while able to sustain any 2 drives failing without a loss in data

Thanks

P.S.
Availability is determined by using a synthetic SLA monitor that operates on 2
minute cycles evaluating against a VIP by an external third party. If there are
no errors in the report for the month we hit 100%, I think even one error (due
to the 2 minute window) puts us below 6 9''s..so we basically have a
zero tolerance standard to hit the sla and not get penalized monetarily
 
 
This message posted from opensolaris.org

Wade.Stuart at fallon.com

2007-Mar-07 19:08 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

zfs-discuss-bounces at opensolaris.org wrote on 03/07/2007 12:31:14 PM:
> So it sounds like the consensus is that I should not worry about
> using slices with ZFS
> and the swap best practice doesn''t really apply to my situation of
a
> 4 disk x4200.
>
> So in summary(please confirm) this is what we are saying is a safe
> bet for using in a highly available production environment?
>
> With 4x73 gig disks yielding 70GB each:
>
> 5GB for root which is UFS and mirrored 4 ways using SVM.
> 8GB for swap which is raw and mirrored across first two disks
> (optional: or no liveupgrade and 4 way mirror this swap partition)
> 8GB for LiveUpgrade which is mirrored across the third and fourth two
disks> This leaves 57GB of free space on each of the 4 disks in slices
> One zfs pool will be created containing the 4 slices
> the first two slices will be used in a zmirror yielding 57GB
> The last two slices will be used in a zmirror yielding 57GB
> Then a zstripe (raid0) will be layed over the two zmirrors yielding
> 114GB usable space while able to sustain any 2 drives failing
> without a loss in data
No,  you will be able to sustain up to one disk in each of the two disk
pairs failing at any time with no data loss.  Lose two disks in the mirror
pair set and you lose data (and system panic) --  slightly different then
"any two disks".
>
> Thanks
>
> P.S.
> Availability is determined by using a synthetic SLA monitor that
> operates on 2 minute cycles evaluating against a VIP by an external
> third party. If there are no errors in the report for the month we
> hit 100%, I think even one error (due to the 2 minute window) puts
> us below 6 9''s..so we basically have a zero tolerance standard to
> hit the sla and not get penalized monetarily
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Manoj Joseph

2007-Mar-08 03:58 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Matt B wrote:> Any thoughts on the best practice points I am raising? It disturbs me
> that it would make a statement like "don''t use slices for
> production".
ZFS turns on write cache on the disk if you give it the entire disk to 
manage. It is good for performance. So, you should use whole disks when 
ever possible.

Slices work too, but write cache for the disk will not be turned on by zfs.

Cheers
Manoj

Roch - PAE

2007-Mar-08 10:17 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Manoj Joseph writes:
 > Matt B wrote:
 > > Any thoughts on the best practice points I am raising? It disturbs me
 > > that it would make a statement like "don''t use slices
for
 > > production".
 > 
 > ZFS turns on write cache on the disk if you give it the entire disk to 
 > manage. It is good for performance. So, you should use whole disks when 
 > ever possible.
 > 

Just a small clarification to state that the extra
performance  that comes from having the "write cache on"
applies mostly to disks that do not have other means of
command concurrency (NCQ, CTQ). With NCQ/CTQ, the write
cache setting should not matter much to ZFS performance.

-r

 > Slices work too, but write cache for the disk will not be turned on by
zfs.
 > 
 > Cheers
 > Manoj
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Robert Milkowski

2007-Mar-08 11:38 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Hello Matt,

Wednesday, March 7, 2007, 7:31:14 PM, you wrote:

MB> So it sounds like the consensus is that I should not worry about using
slices with ZFS
MB> and the swap best practice doesn''t really apply to my situation
of a 4 disk x4200.

MB> So in summary(please confirm) this is what we are saying is a
MB> safe bet for using in a highly available production environment?

MB> With 4x73 gig disks yielding 70GB each:

MB> 5GB for root which is UFS and mirrored 4 ways using SVM.
MB> 8GB for swap which is raw and mirrored across first two disks
MB> (optional: or no liveupgrade and 4 way mirror this swap partition)
MB> 8GB for LiveUpgrade which is mirrored across the third and fourth two
disks
MB> This leaves 57GB of free space on each of the 4 disks in slices
MB> One zfs pool will be created containing the 4 slices
MB> the first two slices will be used in a zmirror yielding 57GB
MB> The last two slices will be used in a zmirror yielding 57GB
MB> Then a zstripe (raid0) will be layed over the two zmirrors
MB> yielding 114GB usable space while able to sustain any 2 drives failing
without a loss in data

Eventually if you care about how much storage is available then:

1. 8GB on two disks for / in mirrored config (SVM)
2. 8GB on another two disks for SWAP in mirrored config (SVM)
3. the rest of the disks for zfs

   a. raidz2 4 slices, capacity of 2x slice, bad random read
      performance
   b. raid-10 4 slices, capacity of 2x slice, good read performance,
      less reliability than a.


You loose ability to do LU, but you gain some storage.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Roch - PAE

2007-Mar-12 11:18 UTC

head link

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

Frank Cusack writes:
 > On March 7, 2007 8:50:53 AM -0800 Matt B <mattbreedlove at
yahoo.com> wrote:
 > > Any thoughts on the best practice points I am raising? It disturbs me
 > > that it would make a statement like "don''t use slices
for production".
 > 
 > I think that''s just a performance thing.
 > 

Right, I   think what  would   be  very unoptimal   from ZFS
standpoint would be  to  configure 2 slices from  _one_ disk
into a given zpool. This would  send the I/O scheduler on a
tangent, but it would nevertheless still work.

 > -frank
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Mar 2007 - ZFS/UFS layout for 4 disk servers

[zfs-discuss] ZFS/UFS layout for 4 disk servers

[zfs-discuss] ZFS/UFS layout for 4 disk servers

[zfs-discuss] ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers