thr3ads.net - zfs discuss - [zfs-discuss] How to avoid striping ? [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Habony, Zsolt

2010-Oct-18 06:44 UTC

[zfs-discuss] How to avoid striping ?

Hi,
               I have seen a similar question on this list in the archive but
haven''t seen the answer.
Can I avoid striping across top level vdevs ?

               If I use a zpool which is one LUN from the SAN, and when it
becomes full I add a new LUN to it.
But I cannot guarantee that the LUN will not come from the same spindles on the
SAN.

               Can I force zpool to not to stripe the data ?

Thank You in advance,

Zsolt Habony

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101018/8cb97ffb/attachment.html>

Darren J Moffat

2010-Oct-18 08:18 UTC

head link

[zfs-discuss] How to avoid striping ?

On 18/10/2010 07:44, Habony, Zsolt wrote:> I have seen a similar question on this list in the archive but haven?t
> seen the answer.
>
> Can I avoid striping across top level vdevs ?
>
> If I use a zpool which is one LUN from the SAN, and when it becomes full
> I add a new LUN to it.
>
> But I cannot guarantee that the LUN will not come from the same spindles
> on the SAN.
That sounds like a problem with your SAN config if that matters to you.
> Can I force zpool to not to stripe the data ?
You can''t, but why do you care ?

-- 
Darren J Moffat

Habony, Zsolt

2010-Oct-18 08:28 UTC

head link

[zfs-discuss] How to avoid striping ?

In many large datacenters, a different storage team handles LUN requests and
assignment.
We ask a LUN in a specific size, and we get one.

It might result that the first vdev (LUN) is on a beginning of a RAID set on the
storage,
and the second vdev is on the end of the same RAID set on the same physical
disks. (If not in the creation time, then
later, during the increase of a filled zpool, by adding a LUN)

I worry about head thrashing.  Though memory cache of large storage should make
the problem
easier, I would be more happy if I can be sure that zpool will not be handled as
a stripe.

Is there a way to avoid it, or can we be sure that the problem does not exist at
all ?

-----Original Message-----
From: Darren J Moffat [mailto:darrenm at opensolaris.org] 
Sent: 2010. okt?ber 18. 10:19
To: Habony, Zsolt
Cc: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] How to avoid striping ?

On 18/10/2010 07:44, Habony, Zsolt wrote:> I have seen a similar question on this list in the archive but
haven''t
> seen the answer.
>
> Can I avoid striping across top level vdevs ?
>
> If I use a zpool which is one LUN from the SAN, and when it becomes full
> I add a new LUN to it.
>
> But I cannot guarantee that the LUN will not come from the same spindles
> on the SAN.
That sounds like a problem with your SAN config if that matters to you.
> Can I force zpool to not to stripe the data ?
You can''t, but why do you care ?

-- 
Darren J Moffat

Erik Ableson

2010-Oct-18 08:43 UTC

head link

[zfs-discuss] How to avoid striping ?

Le 18 oct. 2010 ? 08:44, "Habony, Zsolt" <zsolt.habony at
hp.com> a ?crit :
> Hi,
> 
>                I have seen a similar question on this list in the archive
but haven?t seen the answer.
> 
> Can I avoid striping across top level vdevs ?
> 
>  
> 
>                If I use a zpool which is one LUN from the SAN, and when it
becomes full I add a new LUN to it.
> 
> But I cannot guarantee that the LUN will not come from the same spindles on
the SAN.
> 
>  
> 
>                Can I force zpool to not to stripe the data ?
> No. The basic principle of the zpool is dynamic striping across vdevs in order
to ensure that all available spindles are contributing to the workload. If you
want/need more granular control over what data goes to which disk, then
you''ll need to create multiple pools.

Just create a new pool from the new SAN volume and you will segregate the IO.
But then you risk having hot and cold spots in your storage as the IO
won''t be striped. If the approach is to fill a vdev completely before
adding a new one this possibility exists anyway until the block rewrite arrives
to redistribute existing data across available vdevs.

Cheers,

Erik

Habony, Zsolt

2010-Oct-18 09:01 UTC

head link

[zfs-discuss] How to avoid striping ?

>No. The basic principle of the zpool is dynamic striping across vdevs in
order to ensure that all available spindles >are contributing to the
workload. If you want/need more granular control over what data goes to which
disk, then >you''ll need to create multiple pools.
>Just create a new pool from the new SAN volume and you will segregate the
IO.
That''s my understanding and that''s my problem.
You have an application filesystem from one LUN. (vxfs is expensive, ufs/svm is
not really able to handle online filesystem increase. Thus we plan to use zfs
for application filesystems.)
When it fills up you increase it by adding a new LUN.

You have to make sure that the added LUN is from different physical disks. Is
might be not obvious with todays large storages with thousands of LUNs.

If I can force concatenation, then I do not have to investigate, where are the
existing parts of the filesystems.

Darren J Moffat

2010-Oct-18 09:06 UTC

head link

[zfs-discuss] How to avoid striping ?

On 18/10/2010 09:28, Habony, Zsolt wrote:> I worry about head thrashing.  Though memory cache of large storage should
make the problem
Is that really something you should be worried about with all the other 
software and hardware between ZFS and the actual drives ?

If that is a problem then it isn''t ZFS causing it, it will just be
using
the LUNs that was given to it by the SAN.  An access pattern of an 
application on a completely different filesystem could still mean that 
you are using both LUNs in that way.
> Is there a way to avoid it, or can we be sure that the problem does not
exist at all ?
Grow the existing LUN rather than adding another one.

The only way to have ZFS not stripe is to not give it devices to stripe 
over.  So stick with simple mirrors eg this style of configuration:

   pool: builds
  state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	builds      ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    c7t3d0  ONLINE       0     0     0
	    c8t4d0  ONLINE       0     0     0

Where in your configuration c7t3d0/c8t4d0 are your LUNs from the SAN.

Rather than this style:

   pool: builds
  state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	builds      ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    c7t3d0  ONLINE       0     0     0
	    c8t4d0  ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    c7t3d0  ONLINE       0     0     0
	    c8t4d0  ONLINE       0     0     0

-- 
Darren J Moffat

Darren J Moffat

2010-Oct-18 09:09 UTC

head link

[zfs-discuss] How to avoid striping ?

On 18/10/2010 10:01, Habony, Zsolt wrote:> If I can force concatenation, then I do not have to investigate, where are
the existing parts of the filesystems.
You can''t, the code for concatenation rather than stripping does not 
exist and there are no plans to add it.

Instead of assuming you have a problem I''d highly recommend you go with
the recommendation in my other email or don''t worry about it. 
Don''t
assume that you will have a problem with ZFS because of your experience 
with other systems.  Striping isn''t bad it is usually good.

Or fix the root cause of the problem - which in this example case isn''t
ZFS - on the SAN where the LUNs are getting allocated.

-- 
Darren J Moffat

Brandon High

2010-Oct-18 09:11 UTC

head link

[zfs-discuss] How to avoid striping ?

On Mon, Oct 18, 2010 at 1:28 AM, Habony, Zsolt <zsolt.habony at hp.com>
wrote:> Is there a way to avoid it, or can we be sure that the problem does not
exist at all ?
ZFS will coalesce asynchronous writes, which should help for most of
the head trash on write. Using a log device will convert sync writes
to async.

For reads, make sure you have enough memory and a cache device.

-B

-- 
Brandon High : bhigh at freaks.com

Rainer J.H. Brandt

2010-Oct-18 09:13 UTC

head link

[zfs-discuss] How to avoid striping ?

Hi,

Habony, Zsolt writes:> You have an application filesystem from one LUN. (vxfs is expensive,
ufs/svm is not really able to handle online filesystem increase. Thus we plan to
use zfs for application filesystems.)
What do you mean by "not really"?
Use metattach to grow a metadevice or soft partition.
Use growfs to grow UFS on the grown device.

Rainer
-- 

Rainer J. H. Brandt
Brandt & Brandt Computer GmbH
Am Wiesenpfad 6, 53340 Meckenheim
Gesch?ftsf?hrer: Rainer J. H. Brandt und Volker A. Brandt
Handelsregister: Amtsgericht Bonn, HRB 10513

RFC 5322: "Each line [...] SHOULD be no more than 78 characters"

Carson Gaspar

2010-Oct-18 09:27 UTC

head link

[zfs-discuss] How to avoid striping ?

On 10/18/10 2:13 AM, Rainer J.H. Brandt wrote:>
> Habony, Zsolt writes:
>> You have an application filesystem from one LUN. (vxfs is
>> expensive, ufs/svm is not really able to handle online filesystem
>> increase. Thus we plan to use zfs for application filesystems.)
>
> What do you mean by "not really"? Use metattach to grow a
metadevice
> or soft partition. Use growfs to grow UFS on the grown device.
He is probably referring to the fact that growfs locks the filesystem.

-- 
Carson Gaspar

Habony, Zsolt

2010-Oct-18 09:35 UTC

head link

[zfs-discuss] How to avoid striping ?

>> You have an application filesystem from one LUN. (vxfs is expensive,
ufs/svm is not really able to handle online filesystem increase. Thus we plan to
use zfs for application filesystems.)
>What do you mean by "not really"?
...>Use growfs to grow UFS on the grown device.
I know its off-toopic but the statement: " growfs will
``write-lock'''' (see lockfs(1M)) a  mounted  file
system when expanding. " made me always uncomfortable with this online
expansion. I cannot guarantee how a specific
application will behave during the expansion.

Habony, Zsolt

2010-Oct-18 09:40 UTC

head link

[zfs-discuss] How to avoid striping ?

>> Is there a way to avoid it, or can we be sure that the problem does not
exist at all ?
>Grow the existing LUN rather than adding another one.
>The only way to have ZFS not stripe is to not give it devices to stripe 
>over.  So stick with simple mirrors ...
(I do not mirror, as the storage gives redundancy behind LUNs.)

Online LUN expansion seems promising, and answering my question.
Thank You for that.

Zsolt

Casper.Dik at Sun.COM

2010-Oct-18 09:45 UTC

head link

[zfs-discuss] How to avoid striping ?

>>> You have an application filesystem from one LUN. (vxfs is
expensive, ufs/svm is not really able to handle online filesystem increase. Thus we plan to use zfs for application
filesystems.)>
>>What do you mean by "not really"?
>...
>>Use growfs to grow UFS on the grown device.
>
>I know its off-toopic but the statement: " growfs will
``write-lock''''
> (see lockfs(1M)) a  mounted  filesystem when expanding. " made me 
>always uncomfortable with this online expansion. I cannot guarantee how a
>specific application will behave during the expansion.

    -w

         Write-lock  (wlock)  the  specified  file-system.  wlock
         suspends  writes  that  would  modify  the  file system.
         Access times are not kept while a file system is  write-
         locked.


All the applications trying to write will suspend.  What would be the
risk of that?

Casper

Ketola Sami

2010-Oct-18 09:45 UTC

head link

[zfs-discuss] How to avoid striping ?

On 18 Oct 2010, at 12:40, Habony, Zsolt wrote:
>>> Is there a way to avoid it, or can we be sure that the problem does
not exist at all ?
>> Grow the existing LUN rather than adding another one.
> 
>> The only way to have ZFS not stripe is to not give it devices to stripe
>> over.  So stick with simple mirrors ...
> 
> (I do not mirror, as the storage gives redundancy behind LUNs.)
Then you lose ZFS self healing ability.

Sami

Edward Ned Harvey

2010-Oct-18 13:40 UTC

head link

[zfs-discuss] How to avoid striping ?

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Habony, Zsolt
> 
> ?????????????? If I use a zpool which is one LUN from the SAN, and when
> it becomes full I add a new LUN to it.
> But I cannot guarantee that the LUN will not come from the same
> spindles on the SAN.
> 
> ?????????????? Can I force zpool to not to stripe the data ?
If at all possible, you should request that your LUN team give you whole
disks, JBOD, instead of a LUN slice of some other raid set.  The performance
& reliability benefits of ZFS raid over HW raid have been discussed here
many times.  Please ask if you don''t already know.

If they can''t do that for you ... Then your question is an important
one ...
and I have no idea the answer.

Kyle McDonald

2010-Oct-18 14:19 UTC

head link

[zfs-discuss] How to avoid striping ?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 


On 10/18/2010 4:28 AM, Habony, Zsolt wrote:>
> I worry about head thrashing.Why?

If your SAN group gives you a LUN that is at the opposite end of the
array, I would think that was because they had already assigned the
space in the middle to other customers (other groups like yours, or
other hosts of yours.)

If so, don''t you think that all those other hosts and customers will
be reading and writing from that array all the time anyway? I mean if
the heads are going to ''thrash'', then they''ll be
doing so even before
you request your second LUN right?

Adding your second LUN to the mix isn''t going to seriously change the
workload on the disks in the array.
> Though memory cache of large storage should make the problem
> easier, I would be more happy if I can be sure that zpool will not
> be handled as a stripe.
>
> Is there a way to avoid it, or can we be sure that the problem does
> not exist at all ?
>As I think the logic above suggests, If the problem exists, it exists
even when you only have 1 LUN.

  -Kyle
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMvFeKAAoJEEADRM+bKN5wuc4IALPTIrGcAq6TWa95yrA/DCWp
vu2K7+pwSvz/IRIP+C6Y+qvWm/Km+UdtRu6PKb8G/DF8xp5vEnkqXdRSNDC6FlpR
EwSNavS7ij87bN6fuBiw6E02GZtADi2RptPKgyGz1FT3wPDHS8SQKtA59DwrWJNS
ckHUi+9BwngL4p7E0C+8pcahyF7QmtTm3DpL3y4AZ+7O+c/wPcIwLZ3dI6yQU8vd
KuRe6h/xCHffKH9gHoXJf0pG4e5iA8XP+lt7DlJGPxRYzZil0Rr5JA67uGqEf/VY
FbhAtXqWrHkNSd2sk1bIJVj7OFCS6j/NXMkV/Dt6OUH2Gkucl1nBs4yIAQ9Hu3s=I+w1
-----END PGP SIGNATURE-----

Kyle McDonald

2010-Oct-18 14:23 UTC

head link

[zfs-discuss] How to avoid striping ?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 


On 10/18/2010 5:40 AM, Habony, Zsolt wrote:> (I do not mirror, as the storage gives redundancy behind LUNs.)
>By not enabling redundancy (Mirror or RAIDZ[123]) at the ZFS level,
you are opening yourself to corruption problems that the underlying
SAN storage can''t protect you from. the SAN array won''t even
notice
the problem.

ZFS will notice the problem, and (if you don''t give it redundancy to
work with) it won''t be able to repair it for you.

You''d be better off getting unprotected LUNS from the Array, and
letting ZFS handle the redundancy.

  -Kyle> Online LUN expansion seems promising, and answering my question.
> Thank You for that.
>
> Zsolt
>
>
> _______________________________________________ zfs-discuss mailing
> list zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMvFhtAAoJEEADRM+bKN5wmgwIAK2HCAtaHkAp2RxqfkcFGD3A
0YyzP148fzTcEpFwhpNm59nht9fsfAibjCZZ/HmApe2jYWJ2K9l4W0MBXedXnz3e
gEaIxqymSHLjkF2SF0OD2XfnNiDMor5CrzPirZMcAL7TeyIqyACeuQTVVqZPw2rZ
TF1fGG2M9Y0l1Gq5+PfNcGESiz4tb7Er6UtDnLFe7rx4DObNJnO07jr1BMBxHsp8
tL1+YxhAUpWvaKOqHJvruZRtxagdE1KUQAtipPQjZvFudqIVAT8PRL0Acwz0D6aq
Lv1nmYzGg3M1usjrbfSEDV2eM3WR3gc7px93xyxZ1kMQPOgRO7X0YRxwfUMEsUc=+YXG
-----END PGP SIGNATURE-----

Habony, Zsolt

2010-Oct-18 14:44 UTC

head link

[zfs-discuss] How to avoid striping ?

Thank You all for the comments.

You should imagine a datacenter with 
 - standards not completely depending on me.
 - SAN for many OSs, one of them is Solaris, (and not the major amount)
 - usually level 2 engineers doing filesystem increases.
 - hundreds of physical boxes, dozens of virtuals on one physical
 - ability to move VMs (zones) across physical boxes. (by assigning LUNs to
other boxes)

That probably explains, that I cannot use host based raid management,  it is
done by storage as standard.
I cannot assign whole disks to boxes, as I get LUNs standardized for all other
OSs, and in a size optimized for
virtual small virtual machines.

zfs is just used for easy expansion, and snapshotting.


>If your SAN group gives you a LUN that is at the opposite end of the array,
I would think that was because they had >already assigned the space in the
middle to other customers (other groups like yours, or other hosts of yours.)
>Adding your second LUN to the mix isn''t going to seriously change
the workload on the disks in the array.
Though I agree, that I cannot guarantee what other hosts are doing on my LUNs, I
still think that I would avoid
striping over partitions on the same disk. The possible bad thing is better than
an absolutely sure bad thing.



On 10/18/2010 5:40 AM, Habony, Zsolt wrote:>> (I do not mirror, as the storage gives redundancy behind LUNs.)
>>
>By not enabling redundancy (Mirror or RAIDZ[123]) at the ZFS level,
>you are opening yourself to corruption problems that the underlying
>SAN storage can''t protect you from. the SAN array won''t
even notice
>the problem.
So, I cannot redefine our standards here. Maybe zfs does some things better than
the storage, but having standards
for all the other OSs also gives advantages, and yes I know we sacrifice some
useful zfs features.

I hope that explains,  and thank you again for all your valuable comments.

Zsolt

Tim Cook

2010-Oct-18 17:26 UTC

head link

[zfs-discuss] How to avoid striping ?

On Mon, Oct 18, 2010 at 3:28 AM, Habony, Zsolt <zsolt.habony at hp.com>
wrote:
> In many large datacenters, a different storage team handles LUN requests
> and assignment.
> We ask a LUN in a specific size, and we get one.
>
> It might result that the first vdev (LUN) is on a beginning of a RAID set
> on the storage,
> and the second vdev is on the end of the same RAID set on the same physical
> disks. (If not in the creation time, then
> later, during the increase of a filled zpool, by adding a LUN)
>
> I worry about head thrashing.  Though memory cache of large storage should
> make the problem
> easier, I would be more happy if I can be sure that zpool will not be
> handled as a stripe.
>
> Is there a way to avoid it, or can we be sure that the problem does not
> exist at all ?
>
> -----Original Message-----
> From: Darren J Moffat [mailto:darrenm at opensolaris.org]
> Sent: 2010. okt?ber 18. 10:19
> To: Habony, Zsolt
> Cc: zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] How to avoid striping ?
>
> On 18/10/2010 07:44, Habony, Zsolt wrote:
> > I have seen a similar question on this list in the archive but
haven''t
> > seen the answer.
> >
> > Can I avoid striping across top level vdevs ?
> >
> > If I use a zpool which is one LUN from the SAN, and when it becomes
full
> > I add a new LUN to it.
> >
> > But I cannot guarantee that the LUN will not come from the same
spindles
> > on the SAN.
>
> That sounds like a problem with your SAN config if that matters to you.
>
> > Can I force zpool to not to stripe the data ?
>
> You can''t, but why do you care ?
>
> --
> Darren J Moffat
>
>
It shouldn''t matter if LUN''s are on the same backend disk. 
Unless the
manufacturer of the array is brain dead, their wide striping algorithm
should handle it without breaking a sweat.  If the pool of disk can''t
service the number of IOPS, the "storage team" should be moving
LUN''s
around, that''s what they get paid to do.

Your *issue* shouldn''t be an issue at all unless the backend disk is
junk.
 I''ve never seen an issue with Hitachi''s HDP or
NetApp''s aggregates.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101018/23fa1d38/attachment-0001.html>

Peter Jeremy

2010-Oct-18 20:32 UTC

head link

[zfs-discuss] How to avoid striping ?

On 2010-Oct-18 17:45:34 +0800, "Casper.Dik at Sun.COM" <Casper.Dik
at Sun.COM> wrote:>         Write-lock  (wlock)  the  specified  file-system.  wlock
>         suspends  writes  that  would  modify  the  file system.
>         Access times are not kept while a file system is  write-
>         locked.
>
>
>All the applications trying to write will suspend.  What would be the
>risk of that?
At least some versions of Oracle rdbms have timeouts around I/O and
will abort if I/O operations don''t complete within a short period.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101019/ad1077eb/attachment.bin>

Sami Ketola

2010-Oct-19 08:48 UTC

head link

[zfs-discuss] How to avoid striping ?

On 18 Oct 2010, at 17:44, Habony, Zsolt wrote:
> Thank You all for the comments.
> 
> You should imagine a datacenter with 
> - standards not completely depending on me.
> - SAN for many OSs, one of them is Solaris, (and not the major amount)
So you get luns from the storage team and there is nothing you can do about it.
Just use the luns you get as well as you can then. Which is host based mirrored
zpool.
> - usually level 2 engineers doing filesystem increases.
> - hundreds of physical boxes, dozens of virtuals on one physical
> - ability to move VMs (zones) across physical boxes. (by assigning LUNs to
other boxes)
You can do that even if the raid management is done host based with zfs. 
> 
> That probably explains, that I cannot use host based raid management,  it
is done by storage as standard.
No it does not. I would still let zfs do the raid management on host side even
if you can''t stop the storage team from raiding it again on the storage
box.

> I cannot assign whole disks to boxes, as I get LUNs standardized for all
other OSs, and in a size optimized for
> virtual small virtual machines.
You still should mirror across two storage boxes.

Sami

zfs discuss - Oct 2010 - How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?

[zfs-discuss] How to avoid striping ?