thr3ads.net - zfs discuss - [zfs-discuss] Balancing LVOL fill? [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Roy Sigurd Karlsbakk

2010-Oct-19 16:13 UTC

[zfs-discuss] Balancing LVOL fill?

Hi all

I have this server with some 50TB disk space. It originally had 30TB on WD
Greens, was filled quite full, and another storage chassis was added. Now, space
problem gone, fine, but what about speed? Three of the VDEVs are quite full, as
indicated below. VDEV #3 (the one with the spare active) just spent some 72
hours resilvering a 2TB drive. Now, those green drives suck quite hard, but not
_that_ hard. I''m guessing the reason for this slowdown is the fill of
those three first VDEVs.

Now, is there a way, manually or automatically, to somehow balance the data
across these LVOLs? My first guess is that doing this _automatically_ will
require block pointer rewrite, but then, is there way to hack this thing by
hand?

PS: Yeah, I know, it''s more disks on the fourth VDEV than on the first
three, but this was how wi chose to do it.
PPS: c10d1s1 and c11d0s1 is the SLOG mirror, even though zpool iostat
doesn''t show it (zfs status does).

root at urd:~# zpool iostat -v
                 capacity     operations    bandwidth
pool          alloc   free   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
dpool         38.7T  20.9T    302     39  17.3M  2.20M
  raidz2      12.1T   552G     81      3  4.72M   205K
    c7t2d0        -      -     34      2  1.18M  41.5K
    c7t3d0        -      -     34      2  1.18M  41.5K
    c7t4d0        -      -     33      2  1.18M  41.5K
    c7t5d0        -      -     33      2  1.18M  41.5K
    c7t6d0        -      -     34      2  1.18M  41.5K
    c7t7d0        -      -     33      2  1.18M  41.5K
    c8t0d0        -      -     35      2  1.18M  41.5K
  raidz2      12.4T   277G     84      5  4.81M   278K
    c8t1d0        -      -     35      3  1.22M  56.1K
    c8t2d0        -      -     34      3  1.22M  56.1K
    c8t3d0        -      -     35      3  1.22M  56.1K
    c8t4d0        -      -     35      3  1.22M  56.1K
    c8t5d0        -      -     34      3  1.22M  56.1K
    c8t6d0        -      -     35      3  1.22M  56.1K
    c8t7d0        -      -     34      3  1.22M  56.1K
  raidz2      12.0T   631G    101      7  6.50M   294K
    c9t0d0        -      -     39      3  1.56M  58.7K
    c9t1d0        -      -     39      3  1.56M  58.7K
    c9t2d0        -      -     39      3  1.56M  58.7K
    c9t3d0        -      -     39      3  1.56M  58.7K
    spare         -      -    472     42  7.16M  83.9K
      c9t4d0      -      -     39      3  1.56M  58.7K
      c9t7d0      -      -      0    259      2  6.85M
    c9t5d0        -      -     39      3  1.56M  58.7K
    c9t6d0        -      -     38      3  1.56M  58.7K
  mirror      11.8M  4.96G      0     14      0  1.03M
    c10d1s0       -      -      0     14      0  1.03M
    c11d0s0       -      -      0     14      0  1.03M
  raidz2      2.24T  19.5T     33      8  1.23M   417K
    c14t9d0       -      -     11      2   212K  42.3K
    c14t10d0      -      -     11      2   208K  42.3K
    c14t11d0      -      -     12      2   211K  42.3K
    c14t12d0      -      -     11      2   211K  42.3K
    c14t13d0      -      -     11      2   207K  42.3K
    c14t14d0      -      -     12      2   211K  42.3K
    c14t15d0      -      -     11      2   211K  42.3K
    c14t16d0      -      -     11      2   208K  42.3K
    c14t17d0      -      -     12      2   212K  42.3K
    c14t18d0      -      -     11      2   212K  42.3K
    c14t19d0      -      -     11      2   209K  42.3K
    c14t20d0      -      -     12      2   211K  42.3K
cache             -      -      -      -      -      -
  c10d1s1     69.5G  7.88M      6      6   314K   765K
  c11d0s1     69.5G  6.59M      6      6   313K   766K


-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Roy Sigurd Karlsbakk

2010-Oct-19 17:13 UTC

head link

[zfs-discuss] Balancing LVOL fill?

obviously, I meant VDEVs, not LVOLs... It''s been a long day...

----- Original Message -----> Hi all
> 
> I have this server with some 50TB disk space. It originally had 30TB
> on WD Greens, was filled quite full, and another storage chassis was
> added. Now, space problem gone, fine, but what about speed? Three of
> the VDEVs are quite full, as indicated below. VDEV #3 (the one with
> the spare active) just spent some 72 hours resilvering a 2TB drive.
> Now, those green drives suck quite hard, but not _that_ hard. I''m
> guessing the reason for this slowdown is the fill of those three first
> VDEVs.
> 
> Now, is there a way, manually or automatically, to somehow balance the
> data across these LVOLs? My first guess is that doing this
> _automatically_ will require block pointer rewrite, but then, is there
> way to hack this thing by hand?
> 
> PS: Yeah, I know, it''s more disks on the fourth VDEV than on the
first
> three, but this was how wi chose to do it.
> PPS: c10d1s1 and c11d0s1 is the SLOG mirror, even though zpool iostat
> doesn''t show it (zfs status does).
> 
> root at urd:~# zpool iostat -v
> capacity operations bandwidth
> pool alloc free read write read write
> ------------ ----- ----- ----- ----- ----- -----
> dpool 38.7T 20.9T 302 39 17.3M 2.20M
> raidz2 12.1T 552G 81 3 4.72M 205K
> c7t2d0 - - 34 2 1.18M 41.5K
> c7t3d0 - - 34 2 1.18M 41.5K
> c7t4d0 - - 33 2 1.18M 41.5K
> c7t5d0 - - 33 2 1.18M 41.5K
> c7t6d0 - - 34 2 1.18M 41.5K
> c7t7d0 - - 33 2 1.18M 41.5K
> c8t0d0 - - 35 2 1.18M 41.5K
> raidz2 12.4T 277G 84 5 4.81M 278K
> c8t1d0 - - 35 3 1.22M 56.1K
> c8t2d0 - - 34 3 1.22M 56.1K
> c8t3d0 - - 35 3 1.22M 56.1K
> c8t4d0 - - 35 3 1.22M 56.1K
> c8t5d0 - - 34 3 1.22M 56.1K
> c8t6d0 - - 35 3 1.22M 56.1K
> c8t7d0 - - 34 3 1.22M 56.1K
> raidz2 12.0T 631G 101 7 6.50M 294K
> c9t0d0 - - 39 3 1.56M 58.7K
> c9t1d0 - - 39 3 1.56M 58.7K
> c9t2d0 - - 39 3 1.56M 58.7K
> c9t3d0 - - 39 3 1.56M 58.7K
> spare - - 472 42 7.16M 83.9K
> c9t4d0 - - 39 3 1.56M 58.7K
> c9t7d0 - - 0 259 2 6.85M
> c9t5d0 - - 39 3 1.56M 58.7K
> c9t6d0 - - 38 3 1.56M 58.7K
> mirror 11.8M 4.96G 0 14 0 1.03M
> c10d1s0 - - 0 14 0 1.03M
> c11d0s0 - - 0 14 0 1.03M
> raidz2 2.24T 19.5T 33 8 1.23M 417K
> c14t9d0 - - 11 2 212K 42.3K
> c14t10d0 - - 11 2 208K 42.3K
> c14t11d0 - - 12 2 211K 42.3K
> c14t12d0 - - 11 2 211K 42.3K
> c14t13d0 - - 11 2 207K 42.3K
> c14t14d0 - - 12 2 211K 42.3K
> c14t15d0 - - 11 2 211K 42.3K
> c14t16d0 - - 11 2 208K 42.3K
> c14t17d0 - - 12 2 212K 42.3K
> c14t18d0 - - 11 2 212K 42.3K
> c14t19d0 - - 11 2 209K 42.3K
> c14t20d0 - - 12 2 211K 42.3K
> cache - - - - - -
> c10d1s1 69.5G 7.88M 6 6 314K 765K
> c11d0s1 69.5G 6.59M 6 6 313K 766K
> 
> 
> --
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> roy at karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres
> intelligibelt. Det er et element?rt imperativ for alle pedagoger ?
> unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de
> fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Tuomas Leikola

2010-Oct-20 09:24 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On Tue, Oct 19, 2010 at 7:13 PM, Roy Sigurd Karlsbakk <roy at
karlsbakk.net> wrote:> I have this server with some 50TB disk space. It originally had 30TB on WD
Greens, was filled quite full, and another storage chassis was added. Now, space
problem gone, fine, but what about speed? Three of the VDEVs are quite full, as
indicated below. VDEV #3 (the one with the spare active) just spent some 72
hours resilvering a 2TB drive. Now, those green drives suck quite hard, but not
_that_ hard. I''m guessing the reason for this slowdown is the fill of
those three first VDEVs.
>
> Now, is there a way, manually or automatically, to somehow balance the data
across these LVOLs? My first guess is that doing this _automatically_ will
require block pointer rewrite, but then, is there way to hack this thing by
hand?

I described a similar issue in
http://opensolaris.org/jive/thread.jspa?threadID=134581&tstart=30. My
solution was to copy some datasets over to a new directory, delete the
old ones and destroy any snapshots that retain them. Data is read from
the old device and written on all, causing large chunks of space to be
freed on the old device.

I wished for a more aggressive write balancer but that may be too much
to ask for.

Richard Elling

2010-Oct-20 14:00 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On Oct 20, 2010, at 2:24 AM, Tuomas Leikola wrote:
> On Tue, Oct 19, 2010 at 7:13 PM, Roy Sigurd Karlsbakk <roy at
karlsbakk.net> wrote:
>> I have this server with some 50TB disk space. It originally had 30TB on
WD Greens, was filled quite full, and another storage chassis was added. Now,
space problem gone, fine, but what about speed? Three of the VDEVs are quite
full, as indicated below. VDEV #3 (the one with the spare active) just spent
some 72 hours resilvering a 2TB drive. Now, those green drives suck quite hard,
but not _that_ hard. I''m guessing the reason for this slowdown is the
fill of those three first VDEVs.
>> 
>> Now, is there a way, manually or automatically, to somehow balance the
data across these LVOLs? My first guess is that doing this _automatically_ will
require block pointer rewrite, but then, is there way to hack this thing by
hand?
> 
> 
> I described a similar issue in
> http://opensolaris.org/jive/thread.jspa?threadID=134581&tstart=30. My
> solution was to copy some datasets over to a new directory, delete the
> old ones and destroy any snapshots that retain them. Data is read from
> the old device and written on all, causing large chunks of space to be
> freed on the old device.
> 
> I wished for a more aggressive write balancer but that may be too much
> to ask for.
This can, of course, be tuned.  Would you be interested in characterizing the
benefits and costs of a variety of such tunings?
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
USENIX LISA ''10 Conference, November 7-12, San Jose, CA
ZFS and performance consulting
http://www.RichardElling.com

Tuomas Leikola

2010-Oct-20 15:51 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On Wed, Oct 20, 2010 at 5:00 PM, Richard Elling
<richard.elling at gmail.com> wrote:>>> Now, is there a way, manually or automatically, to somehow balance
the data across these LVOLs? My first guess is that doing this _automatically_
will require block pointer rewrite, but then, is there way to hack this thing by
hand?
>>
>>
>> I described a similar issue in
>> http://opensolaris.org/jive/thread.jspa?threadID=134581&tstart=30.
My
>> solution was to copy some datasets over to a new directory, delete the
>> old ones and destroy any snapshots that retain them. Data is read from
>> the old device and written on all, causing large chunks of space to be
>> freed on the old device.
>>
>> I wished for a more aggressive write balancer but that may be too much
>> to ask for.
>
> This can, of course, be tuned. ?Would you be interested in characterizing
the
> benefits and costs of a variety of such tunings?
If you''re asking whether I wish to test and document my findings with
such tunables, then yes, I''m interested, though this is a home file
server so it''s not exactly laboratory environment. I also think I can
produce enough spare parts to do synthetic tests (maybe in a VM
environment).

I was not aware of such tunables, though it appeared there might be
some emergency mode when a vdev has only a few percent space left.

My server is currently running OI_147 but I haven''t yet upgraded the
pool so it''s still version 14. I also have 111b and 134 boot
environments standing by.

-- 
- Tuomas

David Dyer-Bennet

2010-Oct-20 17:28 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On Wed, October 20, 2010 04:24, Tuomas Leikola wrote:
> I wished for a more aggressive write balancer but that may be too much
> to ask for.
I don''t think it can be too much to ask for.  Storage servers have long
enough lives that adding disks to them is a routine operation; to the
extent that that''s a problem, that really needs to be fixed.

However, it''s not the sort of thing one should hold one''s
breath waiting for!

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Peter Jeremy

2010-Oct-20 21:06 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On 2010-Oct-21 01:28:46 +0800, David Dyer-Bennet <dd-b at dd-b.net>
wrote:>On Wed, October 20, 2010 04:24, Tuomas Leikola wrote:
>
>> I wished for a more aggressive write balancer but that may be too much
>> to ask for.
>
>I don''t think it can be too much to ask for.  Storage servers have
long
>enough lives that adding disks to them is a routine operation; to the
>extent that that''s a problem, that really needs to be fixed.
It will (should) arrive as part of the mythical block pointer rewrite project.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101021/8d983239/attachment-0001.bin>

Tuomas Leikola

2010-Oct-21 19:25 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On Thu, Oct 21, 2010 at 12:06 AM, Peter Jeremy
<peter.jeremy at alcatel-lucent.com> wrote:> On 2010-Oct-21 01:28:46 +0800, David Dyer-Bennet <dd-b at dd-b.net>
wrote:
>>On Wed, October 20, 2010 04:24, Tuomas Leikola wrote:
>>
>>> I wished for a more aggressive write balancer but that may be too
much
>>> to ask for.
>>
>>I don''t think it can be too much to ask for. ?Storage servers
have long
>>enough lives that adding disks to them is a routine operation; to the
>>extent that that''s a problem, that really needs to be fixed.
>
> It will (should) arrive as part of the mythical block pointer rewrite
project.
>
Actually BP rewrite would be needed for data rebalancing after the
fact, as I was referring to write balancing that tries to mitigate the
problem before is occurs.

I was thinking of having a tunable like
"writebalance=conservative|aggressive" where conservative would be the
current mode and aggressive would be something like aiming that all
devices reach 90% at exactly the same time, and avoid writing on
devices over 90% altogether. The 90% limit is of course arbitrary, but
seems like some kind of tripping point commonly.

The downside of using aggressive balancing would of course be smaller
write bandwidth, and the data written would not be striped so also
subsequent read might have a drawback. Impact would depend heavily on
usage pattern, obviously, but I expect most use cases would either not
suffer from this, and it is arguable whether somewhat reduced
bandwidth is preferable to serious write slowdown later down the road
- the difference seems to be orders of magnitude, anyway.

-- 
- Tuomas

Brandon High

2010-Nov-01 07:24 UTC

head link

[zfs-discuss] Balancing LVOL fill?

On Tue, Oct 19, 2010 at 6:13 AM, Roy Sigurd Karlsbakk <roy at
karlsbakk.net> wrote:> Now, is there a way, manually or automatically, to somehow balance the data
across these LVOLs? My first guess is that doing this _automatically_ will
require block pointer rewrite, but then, is there way to hack this thing by
hand?
You could send | receive some datasets to the same system, then
destroy the original and rename the new copy back to the original
location. Or send a dataset to a different system, destroy the
original, and then send it back again.

Most of the new copy should end up on the new vdev, which will help
balance things some. Of course, since the new copy is still mostly on
one vdev and may not have better performance. Future writes will be
able to spread across all the vdevs.

You could continue to do this until you feel that datasets have been
balanced out. Imagine mixing two fluids by pouring them back and forth
between two glasses - After a few times it''ll be a homogenous
solution.

This makes me wonder if a ''ghetto bp_rewrite'' would be
possible by
simply preventing future writes to one vdev and duplicating (even via
send | receive) all the blocks that are stored on that vdev.

-B

-- 
Brandon High : bhigh at freaks.com

Possibly Parallel Threads

Search for more possibly parallel threads

zfs discuss - Oct 2010 - Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

[zfs-discuss] Balancing LVOL fill?

Possibly Parallel Threads