thr3ads.net - zfs discuss - [zfs-discuss] Pool performance when nearly full [Dec 2012]

If this information is useful, please help other people find it:
Share via:

sol

2012-Dec-20 17:25 UTC

[zfs-discuss] Pool performance when nearly full

Hi

I know some of this has been discussed in the past but I can''t quite
find the exact information I''m seeking
(and I''d check the ZFS wikis but the websites are down at the moment).

Firstly, which is correct, free space shown by "zfs list" or by
"zpool iostat" ?

zfs list:
used 50.3 TB, free 13.7 TB, total = 64 TB, free = 21.4%

zpool iostat:
used 61.9 TB, free 18.1 TB, total = 80 TB, free = 22.6%

(That''s a big difference, and the percentage doesn''t agree)

Secondly, there''s 8 vdevs each of 11 disks.
6 vdevs show used 8.19 TB, free 1.81 TB, free = 18.1%
2 vdevs show used 6.39 TB, free 3.61 TB, free = 36.1%

I''ve heard that?
a) performance degrades when free space is below a certain amount
b) data is written to different vdevs depending on free space

So a) how do I determine the exact value when performance degrades and how
significant is it?
b) has that threshold been reached (or exceeded?) in the first six vdevs?
and if so are the two emptier vdevs being used exclusively to prevent
performance degrading
so it will only degrade when all vdevs reach the magic 18.1% free (or whatever
it is)?

Presumably there''s no way to identify which files are on which vdevs in
order to delete them and recover the performance?

Thanks for any explanations!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121220/93622535/attachment.html>

Cindy Swearingen

2012-Dec-20 18:18 UTC

head link

[zfs-discuss] Pool performance when nearly full

Hi Sol,

You can review the Solaris 11 ZFS best practices info, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/practice-1.html#scrolltoc

The above section also provides info about the full pool performance
penalty.

For S11 releases, we''re going to increase the 80% pool capacity
recommendation to 90%.

Pool/file system space accounting is dependent on the type
of pool that you can read about, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/gbbti.html#scrolltoc

Thanks,

Cindy


On 12/20/12 10:25, sol wrote:> Hi
>
> I know some of this has been discussed in the past but I can''t
quite
> find the exact information I''m seeking
> (and I''d check the ZFS wikis but the websites are down at the
moment).
>
> Firstly, which is correct, free space shown by "zfs list" or by
"zpool
> iostat" ?
>
> zfs list:
> used 50.3 TB, free 13.7 TB, total = 64 TB, free = 21.4%
>
> zpool iostat:
> used 61.9 TB, free 18.1 TB, total = 80 TB, free = 22.6%
>
> (That''s a big difference, and the percentage doesn''t
agree)
>
> Secondly, there''s 8 vdevs each of 11 disks.
> 6 vdevs show used 8.19 TB, free 1.81 TB, free = 18.1%
> 2 vdevs show used 6.39 TB, free 3.61 TB, free = 36.1%
>
> I''ve heard that
> a) performance degrades when free space is below a certain amount
> b) data is written to different vdevs depending on free space
>
> So a) how do I determine the exact value when performance degrades and
> how significant is it?
> b) has that threshold been reached (or exceeded?) in the first six vdevs?
> and if so are the two emptier vdevs being used exclusively to prevent
> performance degrading
> so it will only degrade when all vdevs reach the magic 18.1% free (or
> whatever it is)?
>
> Presumably there''s no way to identify which files are on which
vdevs in
> order to delete them and recover the performance?
>
> Thanks for any explanations!
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Klimov

2012-Dec-20 20:32 UTC

head link

[zfs-discuss] Pool performance when nearly full

On 2012-12-20 18:25, sol wrote:> Hi
>
> I know some of this has been discussed in the past but I can''t
quite
> find the exact information I''m seeking
> (and I''d check the ZFS wikis but the websites are down at the
moment).
>
> Firstly, which is correct, free space shown by "zfs list" or by
"zpool
> iostat" ? (...)
> (That''s a big difference, and the percentage doesn''t
agree)
I believe, zpool iostat (and zpool list) report raw storage accounting,
basically - the number of HDD sectors available and consumed, including
redundancy and metadata (so available space also includes the unused-yet
redundancy overhead), and the reserved space (like 1/64 of the pool size
for system use - including attempts to counter the said performance
degradation on full pools).

zfs list displays user-data info - what is available after redundancy
and system reservations, and in general subject to "(ref)reservation"
and "(ref)quota" on datasets in the pool. When cloning and dedup come
into play as well as compression, this accounting becomes tricky.
Overall, there is one number you can trust: the used space in a dataset
says how much userdata (including directory structures, but also after
compression) is referenced in this filesystem, if you limit or bill by
consumption - the end-user value of your service. This does not mean
that only this filesystem references these blocks, though. And the
other numbers are more vague (i.e. with good dedup+compress ratios you
can sum up the used spaces to much more than the raw pool sizes).

> Secondly, there''s 8 vdevs each of 11 disks.
> 6 vdevs show used 8.19 TB, free 1.81 TB, free = 18.1%
> 2 vdevs show used 6.39 TB, free 3.61 TB, free = 36.1%
How did you look that up? ;)
>
> I''ve heard that
> a) performance degrades when free space is below a certain amount
Basically, the "mechanics" of the degradation is that ZFS writes new
data into available space "bubbles" within a range called
"metaslab".
It tries to make sequential writes to do stuff faster. If your pool
has seen lots of writes and deletions, its free spaces may have become
fragmented, so search for the "bubbles" takes longer, and they are too
small to fit the whole incoming transaction - leading to more HDD seeks
and thus more latency on write. In extreme, ZFS can''t even find holes
big enough for a block, so it splits the block data into several pieces
and writes "gang blocks", using many tiny IOs with many mechanical HDD
seeks.

Numbers - how full is a pool to display these problems - are highly
individual. Some pools saw it after filling to 60%, typical is 80-90%,
and for write-only pools you might never see this problem because you
don''t delete stuff (well, except maybe for metadata during updates,
all of which usually consumes 1-3% of total allocation).
> b) data is written to different vdevs depending on free space
There are several rules which influence the preference of a Top-level
VDEV and of a metaslab region inside it, which probably include free
space, known presence of large "bubbles" to write into, and location
on the disk (slower-faster LBA tracks).
>
> So a) how do I determine the exact value when performance degrades and
> how significant is it?
> b) has that threshold been reached (or exceeded?) in the first six vdevs?
> and if so are the two emptier vdevs being used exclusively to prevent
> performance degrading
> so it will only degrade when all vdevs reach the magic 18.1% free (or
> whatever it is)?
Hopefully, this was answered above :)
> Presumably there''s no way to identify which files are on which
vdevs in
> order to delete them and recover the performance?
It is possible, but not simple, and is not guaranteed to get the
result you want (though there is little harm in trying).

You can use "zdb" to extract information about an inode on a dataset
as a listing of block pointer entries which form a tree for this file.

For example:
# ls -lani /lib/libnsl.so.1
9239 -rwxr-xr-x   1 0    2     649720 Jun  8  2012 /lib/libnsl.so.1

# df -k /lib/libnsl.so.1
Filesystem            kbytes    used   avail capacity  Mounted on
rpool/ROOT/oi_151a4  61415424  452128 24120824     2%    /

Here the first number from "ls -i" gives us the inode of the file,
and the "df" confirms the dataset name. So we can zdb walk:

# zdb -ddddd -bbbbbb rpool/ROOT/oi_151a4 9239
Dataset rpool/ROOT/oi_151a4 [ZPL], ID 5299, cr_txg 1349648, 442M,
8213 objects, rootbp DVA[0]=<0:a6921d600:200>
DVA[1]=<0:2ffc7b400:200>
[L0 DMU objset] fletcher4 lzjb LE contiguous unique double
size=800L/200P birth=4682209L/4682209P fill=8213
cksum=16f122cb05:77d20eea7b8:155c69ed5a6ce:2b90104e19641f

     Object  lvl   iblk   dblk  dsize  lsize   %full  type
       9239    2    16K   128K   642K   640K  100.00  ZFS plain file
                                         168   bonus  System attributes
         dnode flags: USED_BYTES USERUSED_ACCOUNTED
         dnode maxblkid: 4
         path    /lib/libnsl.so.1
         uid     0
         gid     2
         atime   Fri Jun  8 00:22:17 2012
         mtime   Fri Jun  8 00:22:17 2012
         ctime   Fri Jun  8 00:22:17 2012
         crtime  Fri Jun  8 00:22:17 2012
         gen     1349746
         mode    100755
         size    649720
         parent  25
         links   1
         pflags  40800000104
Indirect blocks:
                0 L1  DVA[0]=<0:940298000:400>
DVA[1]=<0:263234a00:400>
[L1 ZFS plain file] fletcher4 lzjb LE contiguous unique double
size=4000L/400P birth=1349746L/1349746P fill=5
cksum=682d4fda0b:3cc1aa306094:13ebb22837cf14:4c5c67e522dbca8

                0  L0 DVA[0]=<0:95f337000:20000> [L0 ZFS plain file]
fletcher4 uncompressed LE contiguous unique single size=20000L/20000P
birth=1349746L/1349746P fill=1
cksum=23fce6aa160b:5ab11e5fcbc6c2e:5b38f230e01d508d:12cf92941e4b2487

            20000  L0 DVA[0]=<0:95f357000:20000> [L0 ZFS plain file]
fletcher4 uncompressed LE contiguous unique single size=20000L/20000P
birth=1349746L/1349746P fill=1
cksum=3f0ac207affd:f8ed413113d6bdd:24e36c7682cfc297:2549c866ab61e464

            40000  L0 DVA[0]=<0:95f377000:20000> [L0 ZFS plain file]
fletcher4 uncompressed LE contiguous unique single size=20000L/20000P
birth=1349746L/1349746P fill=1
cksum=3d40bf3329f0:f459bc876303dd7:2230ee348b7b08c5:3a65d1ebbf52c9dc

            60000  L0 DVA[0]=<0:95f397000:20000> [L0 ZFS plain file]
fletcher4 uncompressed LE contiguous unique single size=20000L/20000P
birth=1349746L/1349746P fill=1
cksum=19e01b53eb67:956b52d1df6ecd4:38ff9bd1302bf879:e4661798dd1ae8a0

            80000  L0 DVA[0]=<0:95f3b7000:20000> [L0 ZFS plain file]
fletcher4 uncompressed LE contiguous unique single size=20000L/20000P
birth=1349746L/1349746P fill=1
cksum=361e6fd03d40:d0903e491fa09e9:7a2e453ed28baa92:28562c53af3c0495

                 segment [0000000000000000, 00000000000a0000) size  640K

After several higher layers of the pointers (just L1 in example above),
you have "L0" entries which point to actual data blocks with their DVA
fields.

The example file above fits in five 128K blocks at level L0.

The first component of the DVA address is the top-level vdev ID,
followed by offset and allocation size (including raidzN redundancy).
Depending on your pool''s history, larger files may have been striped
over several TLVDEVs however, and relocating them (copying over and
deleting the original) might help or not help free up a particular
TLVDEV (upon rewrite they will be striped again, albeit maybe ZFS
will make different decisions upon a new write - and prefer the more
free devices).

Also, if the file''s blocks are referenced via snapshots, clones,
dedup or hardlinks, they won''t actually be released when you delete
a particular copy of the file.

HTH,
//Jim Klimov

Timothy Coalson

2012-Dec-20 22:40 UTC

head link

[zfs-discuss] Pool performance when nearly full

On Thu, Dec 20, 2012 at 2:32 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>
>  Secondly, there''s 8 vdevs each of 11 disks.
>> 6 vdevs show used 8.19 TB, free 1.81 TB, free = 18.1%
>> 2 vdevs show used 6.39 TB, free 3.61 TB, free = 36.1%
>>
>
> How did you look that up? ;)
>
>"zpool iostat -v" or "zpool list -v"

Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121220/3f733b57/attachment-0001.html>

Apparently Analagous Threads

Search for more possibly parallel threads

zfs discuss - Dec 2012 - Pool performance when nearly full

[zfs-discuss] Pool performance when nearly full

[zfs-discuss] Pool performance when nearly full

[zfs-discuss] Pool performance when nearly full

[zfs-discuss] Pool performance when nearly full

Apparently Analagous Threads