thr3ads.net - zfs discuss - [zfs-discuss] # disks per vdev [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Lanky Doodle

2011-Jun-14 10:55 UTC

[zfs-discuss] # disks per vdev

Hiya,

I am just in the planning stages for my ZFS Home Media Server build at the
moment (to replace WHS v1).

I plan to use 2x motherboard ports and 2x Supermicro AOC-SASLP-MV8 8 port SATA
cards to give 17* drive connections; 2 disks (120GB SATA 2.5") will be used
for the ZFS install using the motherboard ports and the remaing 15 disks (1TB
SATA) will be used for data using the 2x 8 port cards.

* = the total number of ports is 18 but I only have enough space in the chassis
for 17 drives (2x 2.5" in 1x 3.5" bay and 15x 3.5" by using
5-in-3 hotswop caddies in 9x 5.1/4" bays).

All disks are 5400RPM to keep power requirements down.

The ZFS install will be mirrored, but I am not sure how to configure the 15 data
disks from a performance (inc. resilvering) vs protection vs usable space
perspective;

3x 5 disk raid-z. 3 disk failures in the right scenario, 12TB storage
2x 7 disk raid-z + hot spare. 2 disk failures in the right scenario, 12TB
storage
1x 15 disk raid-z2. 2 disk failures, 13TB storage
2x 7 disk raid-z2 + hot spare. 4 disk failures in the right scenario, 10TB
storage

Without having a mash of different raid-z* levels I can''t think of any
other options.

I am leaning towards the first option as it gives seperation between all the
disks; I would have seperate Movie folders on each of them while having critical
data (pictures, home videos, documents etc) stored on each set of raid-z.

Suggestions welcomed.

Thanks
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Jun-14 12:02 UTC

head link

[zfs-discuss] # disks per vdev

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Lanky Doodle
> 
> The ZFS install will be mirrored, but I am not sure how to configure the
15> data disks from a performance (inc. resilvering) vs protection vs usable
space> perspective;
> 
> 3x 5 disk raid-z. 3 disk failures in the right scenario, 12TB storage
> 2x 7 disk raid-z + hot spare. 2 disk failures in the right scenario, 12TB
storage> 1x 15 disk raid-z2. 2 disk failures, 13TB storage
> 2x 7 disk raid-z2 + hot spare. 4 disk failures in the right scenario, 10TBstorage

The above all provide highest usable space (lowest hardware cost.)
But if you want performance, go for mirrors.  (highest hardware cost.)

Lanky Doodle

2011-Jun-14 12:42 UTC

head link

[zfs-discuss] # disks per vdev

Thanks Edward.

I''m in two minds with mirrors. I know they provide the best performance
and protection, and if this was a business critical machine I wouldn''t
hesitate.

But as it for a home media server, which is mainly WORM access and will be
storing (legal!) DVD/Bluray rips i''m not so sure I can sacrify the
space.

7x 2 way mirrors would give me 7TB usable with 1 hot spare, using 1TB disks,
which is a big drop from 12TB! I could always jump to 2TB disks giving me 14TB
usable but I already have 6x 1TB disks in my WHS build which i''d like
to re-use.

Hmmmmmmm....!
-- 
This message posted from opensolaris.org

Marty Scholes

2011-Jun-14 14:06 UTC

head link

[zfs-discuss] # disks per vdev

I am asssuming you will put all of the vdevs into a single pool, which is a good
idea unless you have a specific reason for keeping them separate, e.g. you want
to be able to destroy / rebuild a particular vdev while leaving the others
intact.

Fewer disks per vdev implies more vdevs, providing better random performance,
lower scrub and resilver times and the ability to expand a vdev by replacing
only the few disks in it.

The downside of more vdevs is that you dedicate your parity to each vdev, e.g. a
RAIDZ2 would need two parity disks per vdev.
> I''m in two minds with mirrors. I know they provide
> the best performance and protection, and if this was
> a business critical machine I wouldn''t hesitate.
> 
> But as it for a home media server, which is mainly
> WORM access and will be storing (legal!) DVD/Bluray
> rips i''m not so sure I can sacrify the space.
For a home media server, all accesses are essentially sequential, so random
performance should not be a deciding factor.
> 7x 2 way mirrors would give me 7TB usable with 1 hot
> spare, using 1TB disks, which is a big drop from
> 12TB! I could always jump to 2TB disks giving me 14TB
> usable but I already have 6x 1TB disks in my WHS
> build which i''d like to re-use.
I would be tempted to start with a 4+2 (six disk RAIDZ2) vdev using your current
disks and plan from there.  There is no reason you should feel compelled to buy
more 1TB disks just because you already have some.
> Am I right in saying that single disks cannot be
> added to a raid-z* vdev so a minimum of 3 would be
> required each time. However a mirror is just 2 disks
> so if adding disks over a period of time mirrors
> would be cheaper each time.
That is not correct.  You cannot ever add disks to a vdev.  Well, you can add
additional disks to a mirror vdev, but otherwise, once you set the geometry, a
vdev is stuck for life.

However, you can add any vdev you want to an existing pool.  You can take a pool
with a single vdev set up as a 6x RAIDZ2 and add a single disk to that pool. 
The previous example is a horrible idea because it makes the entire pool
dependent upon a single disk.  The example also illustrates that you can add any
type of vdev to a pool.

Most agree it is best to make the pool from vdevs of identical geometry, but
that is not enforced by zfs.
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jun-14 14:29 UTC

head link

[zfs-discuss] # disks per vdev

Thanks martysch.

That is what I meant about adding disks to vdevs - not adding disks to vdevs but
adding vdevs to pools.

If the geometry of the vdevs should ideally be the same, it would make sense to
buy one more disk now and have a 7 disk raid-z2 to start with, then buy disks as
and when and create a further 7 disk raid-z2 leaving the 15th disk as a hot
spare. Would ''only'' give 10TB usable though.

The only thing though I seem to remember reading that adding vdevs to pools way
after the creation of the pool and data had been written to it, that things
aren''t spread evenly - is that right? So it might actually make sense
to buy all the disks now and start fresh with the final build.

Starting with only 6 disks would leave growth for another 6 disk raid-z2 (to
keep matching geometry) leaving 3 disks spare which is not ideal.
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Jun-14 15:25 UTC

head link

[zfs-discuss] # disks per vdev

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Lanky Doodle
> 
> But as it for a home media server, which is mainly WORM access and will be
> storing (legal!) DVD/Bluray rips i''m not so sure I can sacrify the
space.
For your purposes, raidzN will work very well.  And since you''re going
to
sequentially write your data once initially and leave it in place, even the
resilver should perform pretty well.

Lanky Doodle

2011-Jun-15 08:17 UTC

head link

[zfs-discuss] # disks per vdev

Thanks Edward.

In that case what ''option'' would you choose - smaller raid-z
vdevs or larger raid-z2 vdevs.

I do like the idea of having a hot spare so 2x 7 disk raid-z2 may be the better
option rather than 3x 5 disk raid-z with no hot spare. 2TB loss in the former
could be acceptable I suppose for the sake of better protection. When 4-5TB
drives come to market 2-3TB drives will drop in price so I could always upgrade
them - can you do this with raid-z vdevs, in terms of autoexand?

There might be the odd deletion here and there if a movie is truly turd, but as
you say 99% of the time it will be written and left.
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Jun-15 11:23 UTC

head link

[zfs-discuss] # disks per vdev

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Lanky Doodle
> 
> In that case what ''option'' would you choose - smaller
raid-z vdevs or
larger> raid-z2 vdevs.
The more redundant disks you have, the more protection you get, and the
smaller available disk space.  So that''s entirely up to you.

> When 4-5TB
> drives come to market 2-3TB drives will drop in price so I could always
> upgrade them - can you do this with raid-z vdevs, in terms of autoexand?
Yup.  But you won''t see any increase until you replace all the drives
in the
vdev.

> There might be the odd deletion here and there if a movie is truly turd,
but> as you say 99% of the time it will be written and left.
That won''t matter.  The thing that matters is ...  File fragmentation.

For example, if you run bit torrent directly onto the file server, then
you''re going to get terrible performance for everything because
bittorrent
grabs tiny little fragments all over the place in essentially random order.
But if you rip directly from disc to a file, then it''ll be fine because
it''s
all serialized.  Or use bittorrent onto your laptop and then copy the file
all at once to the server.

The thing that''s bad for performance, especially on raidz, is when
you''re
performing lots of small random operations.  And when you come back to a
large file making small random modifications after it has already been
written and snapshotted...

Lanky Doodle

2011-Jun-15 12:20 UTC

head link

[zfs-discuss] # disks per vdev

That''s how I understood autoexpand, about not doing so until all disks
have been done.

I do indeed rip from disc rather than grab torrents - to VIDEO_TS folders and
not ISO - on my laptop then copy the whole folder up to WHS in one go. So while
they''re not one large single file, they are lots of small .vob files,
but being written in one hit.

This is a bit OT, but can you have one vdev that is a duplicate of another vdev?
By that I mean say you had 2x 7 disk raid-z2 vdevs, instead of them both being
used in one large pool could you have one that is a backup of the other,
allowing you to destroy one of them and re-build without data loss?
-- 
This message posted from opensolaris.org

Jim Klimov

2011-Jun-15 13:16 UTC

head link

[zfs-discuss] # disks per vdev

> This is a bit OT, but can you have one vdev that is a duplicate 
> of another vdev? By that I mean say you had 2x 7 disk raid-z2 
> vdevs, instead of them both being used in one large pool could 
> you have one that is a backup of the other, allowing you to 
> destroy one of them and re-build without data loss?
At least two ways I can think of: maybe you can make a mirror 
of raidz top-level vdevs, or simply use regular zfs send/recv syncs.
This may possibly solve some of fragmentation troubles by 
regrouping blocks during send/recv - but I asked about this 
recently on the list and did not get a definite answer.
 
//Jim
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110615/8258e426/attachment.html>

Krunal Desai

2011-Jun-15 13:24 UTC

head link

[zfs-discuss] # disks per vdev

On Wed, Jun 15, 2011 at 8:20 AM, Lanky Doodle <lanky_doodle at
hotmail.com> wrote:> That''s how I understood autoexpand, about not doing so until all
disks have been done.
>
> I do indeed rip from disc rather than grab torrents - to VIDEO_TS folders
and not ISO - on my laptop then copy the whole folder up to WHS in one go. So
while they''re not one large single file, they are lots of small .vob
files, but being written in one hit.
I decided on 3x 6-drive RAID-Z2s for my home media server, made up of
2TB drives (mix of Barracuda LP 5900rpm and 5K3000), it''s been quite
solid so far. Performance is entirely limited by GigE.

--khd

Marty Scholes

2011-Jun-15 13:31 UTC

head link

[zfs-discuss] # disks per vdev

It sounds like you are getting a good plan together.
> The only thing though I seem to remember reading that adding vdevs to
> pools way after the creation of the pool and data had been written to it,
> that things aren''t spread evenly - is that right? So it might
actually make
> sense to buy all the disks now and start fresh with the final build.
In this scenario, balancing would not impact your performance.  You would start
with the performance of a single vdev.  Adding the second vdev later will only
increase performance, even if horribly imbalanced.  Over time it will start to
balance itself.  If you want it balanced, you can force zfs to start balancing
by copying files then deleting the originals.
> Starting with only 6 disks would leave growth for another 6 disk
> raid-z2 (to keep matching geometry) leaving 3 disks spare which is
> not ideal. 
Maintaining identical geometry only matters if all of the disks are identical. 
If you later add 2TB disks, then pick whatever geometry works for you.  The most
important thing is to maintain consistent vdev types, e.g. all RAIDZ2.
> I do like the idea of having a hot spare
I''m not sure I agree.  In my anecdotal experience, sometimes my array
would offline (for whatever reason) and zfs would try to replace as many disks
as it could with the hot spares.  If there weren''t enough hot spares
for the whole array, then the pool was left irreversibly damaged, having several
disks in the middle of being replaced.  This has only happened once or twice and
in the panic I might have handled it incorrectly, but it has spooked me from
having hot spares.
> This is a bit OT, but can you have one vdev that is a duplicate of
> another vdev? By that I mean say you had 2x 7 disk raid-z2 vdevs, 
> instead of them both being used in one large pool could you have one
> that is a backup of the other, allowing you to destroy one of them
> and re-build without data loss? 
Absolutely.  I do this very thing with large, slow disks holding a backup for
the main disks.  My home server has an SMF service which regularly synchronizes
the time-slider snapshots from each main pool to the backup pool.  This has
saved me when a whole pool disappeared (see above) and has allowed me to make
changes to the layout of the main pools.
-- 
This message posted from opensolaris.org

Cindy Swearingen

2011-Jun-15 15:05 UTC

head link

[zfs-discuss] # disks per vdev

Hi Lanky,

If you created a mirrored pool instead of a RAIDZ pool, you could use
the zpool split feature to split your mirrored pool into two identical
pools.

For example, If you had 3-way mirrored pool, your primary pool will
remain redundant with 2-way mirrors after the split. Then, you would
have a non-redundant pool as a backup. You could also attach more disks
to the backup pool to make it redundant.

At the end of the week or so, destroy the non-redundant pool and
re-attach the disks to your primary pool and repeat.

This is what I would do with daily snapshots and a monthly backup.

Make sure you develop a backup strategy for any pool you build.

Thanks,

Cindy

On 06/15/11 06:20, Lanky Doodle wrote:> That''s how I understood autoexpand, about not doing so until all
disks have been done.
> 
> I do indeed rip from disc rather than grab torrents - to VIDEO_TS folders
and not ISO - on my laptop then copy the whole folder up to WHS in one go. So
while they''re not one large single file, they are lots of small .vob
files, but being written in one hit.
> 
> This is a bit OT, but can you have one vdev that is a duplicate of another
vdev? By that I mean say you had 2x 7 disk raid-z2 vdevs, instead of them both
being used in one large pool could you have one that is a backup of the other,
allowing you to destroy one of them and re-build without data loss?

Roy Sigurd Karlsbakk

2011-Jun-15 17:11 UTC

head link

[zfs-discuss] # disks per vdev

> 3x 5 disk raid-z. 3 disk failures in the right scenario, 12TB storage
> 2x 7 disk raid-z + hot spare. 2 disk failures in the right scenario,
> 12TB storage
> 1x 15 disk raid-z2. 2 disk failures, 13TB storage
> 2x 7 disk raid-z2 + hot spare. 4 disk failures in the right scenario,
> 10TB storage
If paranoid, use two RAIDz2 VDEVs and a spare. If not, use a single RAIDz2 or
RAIDz3 VDEV with 14-15 drives and 1-2 spares. If you choose two VDEVs, replacing
the drives in one of them with bigger ones as the pool grows will be more
flexible, but may lead to badly balanced pools (although I just saw that fixed
in Illumos/openindiana - dunno about s11ex, fbsd or other platforms).
Personally, I''m a bit paranoid, and prefer to use smaller VDEVs. With 7
drives per VDEV in RAIDz2, and a spare, you may still have sufficient space for
some time. If this isn''t backed up somewhere else, I''d be a
wee bit paranoid indeed :)

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Lanky Doodle

2011-Jun-16 09:07 UTC

head link

[zfs-discuss] # disks per vdev

Thanks guys.

I have decided to bite the bullet and change to 2TB disks now rather than go
through all the effort using 1TB disks and then maybe changing in 6-12 months
time or whatever. The price difference between 1TB and 2TB disks is marginal and
I can always re-sell my 6x 1TB disks.

I think I have also narrowed down the raid config to these 4;

2x 7 disk raid-z2 with 1 hot spare - 20TB usable
3x 5 disk raid-z2 with 0 hot spare - 18TB usable
2x 6 disk raid-z2 with 2 hot spares - 16TB usable

with option 1 probably being preferred at the moment.

I am aware that bad batches of disks do exist so I tend to either a) buy them in
sets from different suppliers or b) use different manufacturers. How sensitive
to different disks is ZFS, in terms of disk features (NCQ, RPM speed,
firmware/software versions, cache etc).

Thanks
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Jun-16 11:54 UTC

head link

[zfs-discuss] # disks per vdev

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Lanky Doodle
> 
> can you have one vdev that is a duplicate of another
> vdev? By that I mean say you had 2x 7 disk raid-z2 vdevs, instead of them
> both being used in one large pool could you have one that is a backup of
the> other, allowing you to destroy one of them and re-build without data loss?
Well, you can''t make a vdev from other vdev''s, so you
can''t make a mirror of
raidz, if that''s what you were hoping.

As Cindy mentioned, you can split mirrors...

Or you could use zfs send | zfs receive, to sync one pool to another pool.
This would not care if the architecture of the two pools are the same (the
2nd pool could have different or nonexistent redundancy.)  But this will be
based on snapshots.

Roy Sigurd Karlsbakk

2011-Jun-16 17:06 UTC

head link

[zfs-discuss] # disks per vdev

> I have decided to bite the bullet and change to 2TB disks now rather
> than go through all the effort using 1TB disks and then maybe changing
> in 6-12 months time or whatever. The price difference between 1TB and
> 2TB disks is marginal and I can always re-sell my 6x 1TB disks.
> 
> I think I have also narrowed down the raid config to these 4;
> 
> 2x 7 disk raid-z2 with 1 hot spare - 20TB usable
> 3x 5 disk raid-z2 with 0 hot spare - 18TB usable
> 2x 6 disk raid-z2 with 2 hot spares - 16TB usable
> 
> with option 1 probably being preferred at the moment.
I would choose option 1. I have similar configurations in production. A hot
spare can be very good when a drive dies while you''re not watching.
> I am aware that bad batches of disks do exist so I tend to either a)
> buy them in sets from different suppliers or b) use different
> manufacturers. How sensitive to different disks is ZFS, in terms of
> disk features (NCQ, RPM speed, firmware/software versions, cache etc).
For a home server, it shouldn''t make much difference - the network is
likely to be the bottleneck anyway. If you choose drives with different spin
rate in a pool/vdev, the lower ones will probably pull down performance, so if
you''re considering "green" drives, you should use that for
all the drives. Mixing Seagate, Samsung and Western drives should work well for
this.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Richard Elling

2011-Jun-16 18:28 UTC

head link

[zfs-discuss] # disks per vdev

On Jun 16, 2011, at 2:07 AM, Lanky Doodle wrote:
> Thanks guys.
> 
> I have decided to bite the bullet and change to 2TB disks now rather than
go through all the effort using 1TB disks and then maybe changing in 6-12 months
time or whatever. The price difference between 1TB and 2TB disks is marginal and
I can always re-sell my 6x 1TB disks.
> 
> I think I have also narrowed down the raid config to these 4;
> 
> 2x 7 disk raid-z2 with 1 hot spare - 20TB usable
> 3x 5 disk raid-z2 with 0 hot spare - 18TB usable
> 2x 6 disk raid-z2 with 2 hot spares - 16TB usable
> 
> with option 1 probably being preferred at the moment.
Sounds good to me.
> 
> I am aware that bad batches of disks do exist so I tend to either a) buy
them in sets from different suppliers or b) use different manufacturers. How
sensitive to different disks is ZFS, in terms of disk features (NCQ, RPM speed,
firmware/software versions, cache etc).
Actually, ZFS has no idea it is talking to a disk. ZFS uses block devices. So
there is nothing
in ZFS that knows about NCQ, speed, or any of those sorts of attributes. For the
current disk
drive market, you really don''t have much choice... most vendors offer
very similar disks.
 -- richard

Daniel Carosone

2011-Jun-17 00:54 UTC

head link

[zfs-discuss] # disks per vdev

On Thu, Jun 16, 2011 at 07:06:48PM +0200, Roy Sigurd Karlsbakk
wrote:> > I have decided to bite the bullet and change to 2TB disks now rather
> > than go through all the effort using 1TB disks and then maybe changing
> > in 6-12 months time or whatever. The price difference between 1TB and
> > 2TB disks is marginal and I can always re-sell my 6x 1TB disks.
> > 
> > I think I have also narrowed down the raid config to these 4;
> > 
> > 2x 7 disk raid-z2 with 1 hot spare - 20TB usable
> > 3x 5 disk raid-z2 with 0 hot spare - 18TB usable
> > 2x 6 disk raid-z2 with 2 hot spares - 16TB usable
> > 
> > with option 1 probably being preferred at the moment.
> 
> I would choose option 1. I have similar configurations in
> production. A hot spare can be very good when a drive dies while
> you''re not watching. 
I would probably also go for option 1, with some additional
considerations:

1 - are the 2 vdevs in the same pool, or two separate pools?

If the majority of your bulk data can be balanced manually or by
application software across 2 filesystems/pools, this offers you the
opportunity to replicate smaller more critical data between pools (and
controllers).  This offers better protection against whole-pool
problems (bugs, fat fingers).  With careful arrangement, you could
even have one pool spun down most of the time. 

You mentioned something early on that implied this kind of thinking,
but it seems to have gone by the wayside since.

If you can, I would recommend 2 pools if you go for 2
vdevs. Conversely, in one pool, you might as well go for 15xZ3 since
even this will likely cover performance needs (and see #4).

2 - disk purchase schedule

With 2 vdevs, regardless of 1 or 2 pools, you could defer purchase of
half the 2Tb drives.  With 2 pools, you can use the 6x1Tb and change
that later to 7x with the next purchase, with some juggling of
data. You might be best to buy 1 more 1Tb to get the shape right at 
the start for in-place upgrades, and in a single pool this is
essentially mandatory.

By the time you need more space to buy the second tranche of drives,
3+Tb drives may be the better option.

3 - spare temperature

for levels raidz2 and better, you might be happier with a warm spare
and manual replacement, compared to overly-aggressive automated
replacement if there is a cascade of errors.  See recent threads.

You may also consider a cold spare, leaving a drive bay free for
disks-as-backup-tapes swapping.  If you replace the 1Tb''s now,
repurpose them for this rather than reselling.  

Whatever happens, if you have a mix of drive sizes, your spare should
be of the larger size. Sorry for stating the obvious! :-)

4 - the 16th port

Can you find somewhere inside the case for an SSD as L2ARC on your
last port?  Could be very worthwhile for some of your other data and
metadata (less so the movies).

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110617/da9b6cb0/attachment-0001.bin>

Lanky Doodle

2011-Jun-17 07:43 UTC

head link

[zfs-discuss] # disks per vdev

> 1 - are the 2 vdevs in the same pool, or two separate
> pools?
> I was planning on having the 2 z2 vdevs in one pool. Although having 2 pools and
having them sync''d sounds really good, I fear it may be overkill for
the intended purpose.
> 
> 
> 3 - spare temperature
> 
> for levels raidz2 and better, you might be happier
> with a warm spare
> and manual replacement, compared to overly-aggressive
> automated
> replacement if there is a cascade of errors.  See
> recent threads.
> 
> You may also consider a cold spare, leaving a drive
> bay free for
> disks-as-backup-tapes swapping.  If you replace the
> 1Tb''s now,
> repurpose them for this rather than reselling.  
> I have considered this. The fact I am using cheap disks inevitably means they
will fail sooner and more often than enterprise equivalents so the hot spare may
be need to be over-used.

Could I have different sized vdevs and still have them both in one pool - i.e.
an 8 disk z2 vdev and a 7 disk z2 vdev.
> 
> 4 - the 16th port
> 
> Can you find somewhere inside the case for an SSD as
> L2ARC on your
> last port?  Could be very worthwhile for some of your
> other data and
> metadata (less so the movies).
Yes! I have 10 5.1/4" drive bays in my case. 9 of them are occupied by the
5-in-3 hot swop caddies leaving 1 bay left. I was planning on using one of these
http://www.scan.co.uk/products/icy-dock-mb994sp-4s-4in1-sas-sata-hot-swap-backplane-525-raid-cage
in the drive bay and having 2x 2.5" SATA drives mirrored for the root pool,
leaving 2 drive bays spare.

For the mirrored root pool I was going to use 2 of the 6 motherboard SATA II
ports so they are entirely seperate to the ''data'' controllers.
So I could either use the 16th port on the Supermicro controllers for an SSD or
one of the remaining motherboard ports.

What size would you recommend for the L2ARC disk. I ask as I have a 72GB SAS 10k
disk spare so could use this for now (being faster than SATA), but it would have
to be on the Supermicro card as this also supports SAS drives. SSD''s
are a bit out of range price wise at the moment so i''d wait to use one.
Also ZFS doesn''t support TRIM yet does it?

Thank you for you excellent post! :)
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jun-17 07:55 UTC

head link

[zfs-discuss] # disks per vdev

Thanks Richard.

How does ZFS enumerate the disks? In terms of listing them does it do them
logically, i.e;

controller #1 (motherboard)
    |
    |--- disk1
    |--- disk2
controller #3
    |--- disk3
    |--- disk4
    |--- disk5
    |--- disk6
    |--- disk7
    |--- disk8
    |--- disk9
    |--- disk10
controller #4
    |--- disk11
    |--- disk12
    |--- disk13
    |--- disk14
    |--- disk15
    |--- disk16
    |--- disk17
    |--- disk18

or is it completely random leaving me with some trial and error to work out what
disk is on what port?
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jun-17 07:59 UTC

head link

[zfs-discuss] # disks per vdev

>I was planning on using one of
> these
> http://www.scan.co.uk/products/icy-dock-mb994sp-4s-4in
> 1-sas-sata-hot-swap-backplane-525-raid-cage
Imagine if 2.5" 2TB disks were price neutral compared to 3.5"
equivalents.

I could have 40 of the buggers in my system giving 80TB raw storage!!!!!
I''d happily use mirrors all the way in that scenario....
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jun-17 08:04 UTC

head link

[zfs-discuss] # disks per vdev

> 4 - the 16th port
> 
> Can you find somewhere inside the case for an SSD as
> L2ARC on your
> last port?
Although saying that, if we are saying hot spares may be bad in my scenario, I
could ditch it and use an 3.5" SSD in the 15th drive''s place?
-- 
This message posted from opensolaris.org

Erik Trimble

2011-Jun-17 10:40 UTC

head link

[zfs-discuss] # disks per vdev

On 6/17/2011 12:55 AM, Lanky Doodle wrote:> Thanks Richard.
>
> How does ZFS enumerate the disks? In terms of listing them does it do them
logically, i.e;
>
> controller #1 (motherboard)
>      |
>      |--- disk1
>      |--- disk2
> controller #3
>      |--- disk3
>      |--- disk4
>      |--- disk5
>      |--- disk6
>      |--- disk7
>      |--- disk8
>      |--- disk9
>      |--- disk10
> controller #4
>      |--- disk11
>      |--- disk12
>      |--- disk13
>      |--- disk14
>      |--- disk15
>      |--- disk16
>      |--- disk17
>      |--- disk18
>
> or is it completely random leaving me with some trial and error to work out
what disk is on what port?
This is not a ZFS issue, this is the Solaris device driver issue.

Solaris uses a location-based disk naming scheme, NOT the 
BSD/Linux-style of simply incrementing the disk numbers. I.e. drives are 
usually named something like c<controller>t<target>d<disk>

In most cases, the on-board controllers receive a lower controller 
number than any add-in adapters, and add-in adapters are enumerated in 
PCI ID order. However, there is no good explanation of exactly *what* 
number a given controller may be assigned.

After receiving a controller number, disks are enumerated in ascending 
order by ATA ID, SCSI ID, SAS WWN, or FC WWN.

The naming rules can get a bit complex.

-- 
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

Edward Ned Harvey

2011-Jun-17 11:27 UTC

head link

[zfs-discuss] # disks per vdev

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Lanky Doodle
>  
> or is it completely random leaving me with some trial and error to work
out> what disk is on what port?
It''s highly desirable to have drives with lights on them.  So you can
manually make the light blink (or stay on) just by reading the drive with
dd.

Even if you dig down and quantify precisely how the drives are numbered in
which order ... You would have to find labels printed on the system board or
other sata controllers, and trace the spaghetti of the sata cables, and if
you make any mistake along the way, you destroy your pool.  (Being dramatic,
but not necessarily unrealistic.)

Lights.  Good.

Marty Scholes

2011-Jun-17 13:51 UTC

head link

[zfs-discuss] # disks per vdev

> Lights.  Good.
Agreed.  In a fit of desperation and stupidity I once enumerated disks by
pulling them one by one from the array to see which zfs device faulted.

On a busy array it is hard even to use the leds as indicators.

It makes me wonder how large shops with thousands of spindles handle this.
-- 
This message posted from opensolaris.org

Marty Scholes

2011-Jun-17 13:52 UTC

head link

[zfs-discuss] # disks per vdev

> Lights. Good.
Agreed. In a fit of desperation and stupidity I once enumerated disks by pulling
them one by one from the array to see which zfs device faulted.

On a busy array it is hard even to use the leds as indicators.

It makes me wonder how large shops with thousands of spindles handle this.
-- 
This message posted from opensolaris.org

Erik Trimble

2011-Jun-17 20:00 UTC

head link

[zfs-discuss] # disks per vdev

On 6/17/2011 6:52 AM, Marty Scholes wrote:>> Lights. Good.
> Agreed. In a fit of desperation and stupidity I once enumerated disks by
pulling them one by one from the array to see which zfs device faulted.
>
> On a busy array it is hard even to use the leds as indicators.
>
> It makes me wonder how large shops with thousands of spindles handle this.
We pay for the brand-name disk enclosures or servers where the 
fault-management stuff is supported by Solaris.

Including the blinky lights.

<grin>

-- 
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

marty scholes

2011-Jun-17 20:24 UTC

head link

[zfs-discuss] # disks per vdev

Funny you say that. 

My Sun v40z connected a pair of Sun A5200 arrays running OSol 128a
can''t see the enclosures. The luxadm command comes up blank.

Except for that annoyance (and similar other issues) the Sun gear works well
with a Sun operating system.

Sent from Yahoo! Mail on Android

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110617/19288cdb/attachment.html>

Jim Klimov

2011-Jun-17 21:15 UTC

head link

[zfs-discuss] # disks per vdev

2011-06-18 0:24, marty scholes wrote:> >> It makes me wonder how large shops with thousands of spindles 
> handle this.
>
> > We pay for the brand-name disk enclosures or servers where the 
> fault-management stuff is supported by Solaris.
> > Including the blinky lights.
> > <grin> 
>
> Funny you say that.
>
> My Sun v40z connected a pair of Sun A5200 arrays running OSol 128a 
> can''t see the enclosures. The luxadm command comes up blank.
>
> Except for that annoyance (and similar other issues) the Sun gear 
> works well with a Sun operating system.
>
For the sake of weekend sarcasm:

Why would you wonder? That''s a wrong brand name, it is too old.
Does it say "Oracle" anywhere on the label? Really, "v40z",
pff!
When was it made? Like, in two-thousand-zeros, back when
dinosaurs roamed the earth and Sun was high above horizon?
Is it still supported at all, let alone Solaris (not OSol, may I add)?

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110618/8a36e682/attachment.html>

Edward Ned Harvey

2011-Jun-18 17:33 UTC

head link

[zfs-discuss] # disks per vdev

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Marty Scholes
> 
> On a busy array it is hard even to use the leds as indicators.
Offline the disk.  Light stays off.
Use dd to read the disk.  Light stays on.
That should make it easy enough.

Also, depending on your HBA, lots of times you can blink an Amber LED
instead of the standard green one.

Richard Elling

2011-Jun-18 23:22 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

On Jun 17, 2011, at 12:55 AM, Lanky Doodle wrote:
> Thanks Richard.
> 
> How does ZFS enumerate the disks? In terms of listing them does it do them
logically, i.e;
> 
> controller #1 (motherboard)
>    |
>    |--- disk1
>    |--- disk2
> controller #3
>    |--- disk3
>    |--- disk4
>    |--- disk5
>    |--- disk6
>    |--- disk7
>    |--- disk8
>    |--- disk9
>    |--- disk10
> controller #4
>    |--- disk11
>    |--- disk12
>    |--- disk13
>    |--- disk14
>    |--- disk15
>    |--- disk16
>    |--- disk17
>    |--- disk18
> 
> or is it completely random leaving me with some trial and error to work out
what disk is on what port?
For all intents and purposes, it is random.

Slot locations are the responsibility of the enclosure, not the disk. Until we
get a better framework
integrated into illumos, you can get the bay location from a SES-compliant
enclosure from the fmtopo
output, lsiutil, or the sg_utils. For NexentaStor users I provide some
automation for this in a KB article on
the customer portal.  Also for NexentaStor users, DataON offers a GUI plugin
called DSM that shows the
enclosure, blinky lights, and all of the status information available -- power
supplies, fans, etc -- good
stuff!

For the curious, fmtopo shows the bay for each disk and the serial number of the
disk therein. You can
then cross-reference the c*t*d* number for the OS instance to the serial number.
Note that for dual-port
disks, you can get different c*t*d* numbers for each node connected to the disk
(rare, but possible).
Caveat: please verify prior to rolling into production that the bay number
matches the enclosure silkscreen.
The numbers are programmable and different vendors deliver the same enclosure
with different
silkscreened numbers. As always, the disk serial number is supposed to be
unique, so you can test this
very easily.

For the later Nexenta, OpenSolaris or Solaris 11 Express releases the mpt_sas
driver will try to light the
OK2RM (ok to remove) LED for a disk when you use cfgadm to disconnect the paths.
Apparently this also
works for SATA disks in an enclosure that manages SATA disks. The process is
documented very nicely
by Cindy in the ZFS Admin Guide. However, there are a number of enclosures that
do not have an OK2RM
LED. YMMV.
 -- richard

Lanky Doodle

2011-Jun-21 07:48 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

Thanks for all the replies.

I have a pretty good idea how the disk enclosure assigns slot locations so
should be OK.

One last thing - I see thet Supermicro has just released a newer version of the
card I mentioned in the first post that supports SATA 6Gbps. From what I can see
it uses the Marvell 9480 controller, which I don''t think is supported
in Solaris Express 11 yet.

Does this mean it strictly won''t work (ie no available drivers) or that
it just wouldn''t be supported if there''s problems?
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jun-23 10:05 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

Sorry to pester, but is anyone able to say if the Marvell 9480 chip is now
supported in Solaris?

The article I read saying it wasn''t supported was dated May 2010 so
over a year ago.
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jul-05 10:54 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

OK, I have finally settled on hardware;

2x LSI SAS3081E-R controllers
2x Seagate Momentus 5400.6 rpool disks
15x Hitachi 5K3000 ''data'' disks

I am still undecided as to how to group the disks. I have read elsewhere that
raid-z1 is best suited with either 3 or 5 disks and raid-z2 is better suited
with 6 or 10 disks - is there any truth in this, although I think this was in
reference to 4K sector disks;

3x 5 drive z1 = 24t usable
2x 6 drive z2 = 16t usable

keeping to those recommendations or

2x 7 disk z2 = 20t usable with 1 cold/warm/hot spare

as per my original idea.
-- 
This message posted from opensolaris.org

Orvar Korvar

2011-Jul-05 11:40 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

The LSI2008 chipset is supported and works very well.

I would actually use 2 vdevs; 8 disks in each. And I would configure each vdev
as raidz2. Maybe use one hot spare.

And I also have personal, subjective reasons: I like to use the number of 8 in
computers. 7 is an ugly number. Everything is based on powers of 2 in computers.
A pocket calculator which only accepts the digits 1-8, but not accept the digit
"9", is really ugly (having 7 discs, but not 8, is ugly). Some time
ago, there was a problem unless you used even number of discs, that problem is
corrected now.

I would definitively use raidz2, because resilver time will be very long with
4-5TB disks, potentially several days. During that time, another disk problem
such as reas error might occur, which means you loose all your data.
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jul-05 11:47 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

Thanks.

I ruled out the SAS2008 controller as my motherboard is only PCIe 1.0 so would
not have been able to make the most of the difference in increased bandwidth.

I can''t see myself upgrading every few months (my current WHZ build has
lasted over 4 years without a single change) so by the time I do come to upgrade
PCIe will probably be obselete!!
-- 
This message posted from opensolaris.org

Paul Kraus

2011-Jul-05 12:53 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

On Tue, Jul 5, 2011 at 6:54 AM, Lanky Doodle <lanky_doodle at hotmail.com>
wrote:> OK, I have finally settled on hardware;
>
> 2x LSI SAS3081E-R controllers
> 2x Seagate Momentus 5400.6 rpool disks
> 15x Hitachi 5K3000 ''data'' disks
>
> I am still undecided as to how to group the disks. I have read elsewhere
that raid-z1 is best suited with either 3 or 5 disks and raid-z2 is better
suited with 6 or 10 disks - is there any truth in this, although I think this
was in reference to 4K sector disks;
>
> 3x 5 drive z1 = 24t usable
> 2x 6 drive z2 = 16t usable
Take a look at
https://spreadsheets.google.com/spreadsheet/ccc?key=0AtReWsGW-SB1dFB1cmw0QWNNd0RkR1ZnN0JEb2RsLXc&hl=en_US

I did a bunch of testing with 40 drives. I varied the configuration
between extremes of 10 vdevs of 4 disks each to one vdev of all 40
drives. All vdevs were raidz2, so my net capacity changed, but I was
looking for _relative_ performance differences. I did not test
sequential reads, as that was not one of our expected I/O patterns. I
believe the OS was Solaris 10U8. I know it was at least zpool version
15 and may have been 22.

I used the same 40 drives in all the test cases as I had seen
differences between drives, and choose 40 that all had roughly
matching svc_t values (from iostat). Eventually we had Sun/Oracle come
in and replace any drive who''s svc_t was substantially higher than the
others (these drives also usually had lots of added bad blocks
mapped).
> keeping to those recommendations or
>
> 2x 7 disk z2 = 20t usable with 1 cold/warm/hot spare
The testing was utilizing a portion of our drives, we have 120 x 750
SATA drives in J4400s dual pathed. We ended up with 22 vdevs each a
raidz2 of 5 drives, with one drive in each of the J4400, so we can
lose two complete J4400 chassis and not lose any data.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Krunal Desai

2011-Jul-05 13:19 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

On Tue, Jul 5, 2011 at 7:47 AM, Lanky Doodle <lanky_doodle at hotmail.com>
wrote:> Thanks.
>
> I ruled out the SAS2008 controller as my motherboard is only PCIe 1.0 so
would not have been able to make the most of the difference in increased
bandwidth.
Only PCIe 1.0? What chipset is that based on? Might be worthwhile to
upgrade as I believe Solaris power-management has a fairly recent
cutoff in terms of processor support when it comes to
power-management. (AMD Family 16 or better, Intel Nehalem or newer is
what I''ve been told). PCIe 2.0 has been around for quite awhile, PCIe
3.0 will be making an appearance on Ivy Bridge CPUs (and has already
been announced by FPGA vendors), but I''m fairly confident that
graphics cards will be the first target market to utilize that.

Another thing to consider is that you could buy the SAS2008-based
cards and move them from motherboard to motherboard for the
foreseeable future (copper PCI Express isn''t going anywhere for a long
time). Don''t kneecap yourself because of your current mobo.

--khd

Bob Friesenhahn

2011-Jul-05 14:14 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

On Tue, 5 Jul 2011, Lanky Doodle wrote:>
> I am still undecided as to how to group the disks. I have read 
> elsewhere that raid-z1 is best suited with either 3 or 5 disks and 
> raid-z2 is better suited with 6 or 10 disks - is there any truth in 
> this, although I think this was in reference to 4K sector disks;
The decision to use raid-z1 should be based on the type and size of 
the drives.  If you are using small enterprise-class SAS drives then 
raid-z1 is ok but if you are using large near-line SAS/SATA or large 
desktop SATA drives then you should use raid-z2 instead.  The reason 
for this is that you don''t want to experience the case where the 
remaining drives in a raid-z1 experience a failure while you are 
resilvering to replace a failed drive.

If you have a very good backup system and can afford to restore the 
whole zfs pool from scratch, then that might be an argument to use 
raid-z1.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Trond Michelsen

2011-Jul-05 15:26 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

On Tue, Jul 5, 2011 at 12:54 PM, Lanky Doodle <lanky_doodle at
hotmail.com> wrote:> OK, I have finally settled on hardware;
> 2x LSI SAS3081E-R controllers
Beware that this controller does not support drives larger than 2TB.

-- 
Trond Michelsen

Lanky Doodle

2011-Jul-06 07:38 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

Thanks Trond.

I am aware of this, but to be honest I will not be upgrading very often (my
current WHS setup has lasted 5 years without a single change!) and certainly not
to each iteration of TB size increase, so by the time I do upgrade, say in the
next 5 years PCIe will have probably been replaced, or got to revision 10.0 or
something stupid!

And anyway, my current motherboard (expensive server board) is only PCIe 1.0 so
I wouldn''t get the benefit of having a PCIe 2.0 card.
-- 
This message posted from opensolaris.org

Lanky Doodle

2011-Jul-06 15:42 UTC

head link

[zfs-discuss] Finding disks [was: # disks per vdev]

> The testing was utilizing a portion of our drives, we
> have 120 x 750
> SATA drives in J4400s dual pathed. We ended up with
> 22 vdevs each a
> raidz2 of 5 drives, with one drive in each of the
> J4400, so we can
> lose two complete J4400 chassis and not lose any
> data.
Thanks pk.

You know I never thought about doing 5 drive z2''s. That would be an a
acceptable compromise for me between 2x 7 drive z2''s as;

1) resilver times should be faster
2) 5 drive groupings, matching my 5 drive caddies
3) only losing 2TB usable against 2x 7 drive z2''s
4) IOPS should be faster
5) if and when I scale up, I can add another 5 drives, in another 5 drive caddy

Super!
-- 
This message posted from opensolaris.org

zfs discuss - Jun 2011 - # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] # disks per vdev

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]

[zfs-discuss] Finding disks [was: # disks per vdev]