thr3ads.net - zfs discuss - [zfs-discuss] Performance problem suggestions? [May 2011]

If this information is useful, please help other people find it:
Share via:

Don

2011-May-10 19:31 UTC

[zfs-discuss] Performance problem suggestions?

I''ve been going through my iostat, zilstat, and other outputs all to no
avail. None of my disks ever seem to show outrageous service times, the load on
the box is never high, and if the darned thing is CPU bound- I''m not
even sure where to look.

"(traversing DDT blocks even if in memory, etc - and kernel times indeed
are above 50%) as I''m zeroing "deleted" blocks inside the
"internal" pool. This took several days already, but recovered lots of
space in my main pool also..."
When you say you are zeroing deleted blocks- how are you going about doing that?

Despite claims to the contrary- I can understand ZFS needing some tuning. What I
can''t understand are the baffling differences in performance I see. For
example- after deleting a large volume- suddenly my performance will skyrocket-
then gradually degrade- but the question is why?

I''m not running dedup. My disks seem to be largely idle. I have 8 3GHz
cores that also seem to be idle. I seem to have enough memory. What is ZFS doing
during this time?

Everything I''ve read suggests one of two possible causes- too full, or
bad hardware. Is there anything else that might be an issue here? Another ZFS
factor I haven''t taken into account?

Space seems to be the biggest factor in my performance difference- more free
space = more performance- but as my fullest disks are less than 70% full, and my
emptiest disks are less than 10% full- I can''t understand why space is
an issue.

I have a few hardware errors for one of my pool disks- but we''re
talking about a very small number of errors over a long period of time.
I''m considering replacing this disk but the pool is so slow at times
I''m loathe to slow it down further by doing a replace unless I can be
more certain that is going to fix the problem.
-- 
This message posted from opensolaris.org

Jim Klimov

2011-May-10 21:39 UTC

head link

[zfs-discuss] Performance problem suggestions?

Well, as I wrote in other threads - i have a pool named "pool" on
physical disks, and a compressed volume in this pool which i loopback-mount over
iSCSI to make another pool named "dcpool".

When files in "dcpool" are deleted, blocks are not zeroed out by
current ZFS and they are still allocated for the physical "pool". Now
i''m doing essentially this to clean up the parent pool:
# dd if=/dev/zero of=/dcpool/nodedup/bigzerofile

This file is in a non-deduped dataset, so to the point of view of dcpool, it has
a growing huge file filled with zeroes - and its referenced blocks overwrite
garbage left over from older deleted files and no longer referenced by
"dcpool". However for the "pool" this is a write of
compressed zeroed block, which is not to be referenced, so the "pool"
releases a volume block and its referencing metadata block.

This has already released over half a terabyte in my physical pool (compressed
blocks filled with zeroes are a special case for ZFS and require none or
less-than-usual reference metadata blocks) ;)

However, since I have millions of 4kb blocks for volume data and its metadata, I
guess fragmentation is quite high, maybe even interlacing one-to-one? One way or
another, this "dcpool" never saw IOs faster that say 15Mb/s, and
usually lingers in 1-5Mb/s range, while I can get 30-50Mb/s in the
"pool" easily in other datasets (with dynamic block sizes and
lengthier contiguous data stretches).

Writes had been relatively quick for the first virtual terabyte or so, but
it''s doing the last 100gb for several days now, at several megabytes
per minute in the "dcpool" iostat. There''s several Mb/sec of
IO''s on hardware disks to back this deletion and clean-up, however (as
in my examples in previous post)...

As for disks with different fill ratio - it is a commonly discussed performance
problem. Seems to boil down to this: free space on all disks (actually on
top-level VDEVs) is considered for round-robining writes to stripes. Disks that
have been in use for a longer time may have very fragmented free space on one
hand, and not so much of it on another, but ZFS is still trying to push bits
around evenly. And while it''s waiting on some disks, others may be
blocked as well. Something like that...

People on this forum have seen and reported that adding a 100Mb file tanked
their multiterabyte pool''s performance, and removing the file boosted
it back up.

I don''t want to mix up other writers'' findings, better search
recent 5-10 pages of the forum post headings yourself. It''s within the
last hundred of threads, I think, maybe ;)
-- 
This message posted from opensolaris.org

Hung-ShengTsao (Lao Tsao) Ph.D.

2011-May-10 22:36 UTC

head link

[zfs-discuss] Performance problem suggestions?

it is my understanding for write (fast) consider faster HDD (SSD) for ZIL
for read consider faster HDD(SSD) for L2ARC
There were many discussion for V12N env raid1 is better than raidz

On 5/10/2011 3:31 PM, Don wrote:> I''ve been going through my iostat, zilstat, and other outputs all
to no avail. None of my disks ever seem to show outrageous service times, the
load on the box is never high, and if the darned thing is CPU bound-
I''m not even sure where to look.
>
> "(traversing DDT blocks even if in memory, etc - and kernel times
indeed are above 50%) as I''m zeroing "deleted" blocks inside
the "internal" pool. This took several days already, but recovered
lots of space in my main pool also..."
> When you say you are zeroing deleted blocks- how are you going about doing
that?
>
> Despite claims to the contrary- I can understand ZFS needing some tuning.
What I can''t understand are the baffling differences in performance I
see. For example- after deleting a large volume- suddenly my performance will
skyrocket- then gradually degrade- but the question is why?
>
> I''m not running dedup. My disks seem to be largely idle. I have 8
3GHz cores that also seem to be idle. I seem to have enough memory. What is ZFS
doing during this time?
>
> Everything I''ve read suggests one of two possible causes- too
full, or bad hardware. Is there anything else that might be an issue here?
Another ZFS factor I haven''t taken into account?
>
> Space seems to be the biggest factor in my performance difference- more
free space = more performance- but as my fullest disks are less than 70% full,
and my emptiest disks are less than 10% full- I can''t understand why
space is an issue.
>
> I have a few hardware errors for one of my pool disks- but we''re
talking about a very small number of errors over a long period of time.
I''m considering replacing this disk but the pool is so slow at times
I''m loathe to slow it down further by doing a replace unless I can be
more certain that is going to fix the problem.-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 632 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110510/6eb03bc7/attachment.vcf>

Don

2011-May-11 04:35 UTC

head link

[zfs-discuss] Performance problem suggestions?

> # dd if=/dev/zero of=/dcpool/nodedup/bigzerofileAhh- I misunderstood your pool layout earlier. Now I see what you were doing.
>People on this forum have seen and reported that adding a 100Mb file tanked
their
> multiterabyte pool''s performance, and removing the file boosted it
back up.Sadly I think several of those posts were mine or those of coworkers.
> Disks that have been in use for a longer time may have very fragmented free
> space on one hand, and not so much of it on another, but ZFS is still
trying to push
> bits around evenly. And while it''s waiting on some disks, others
may be blocked as
> well. Something like that...This could explain why performance would go up after a large delete but
I''ve not seen large wait times for any of my disks. The service time,
percent busy, and every other metric continues to show nearly idle disks.

If this is the problem- it would be nice if there were a simple zfs or dtrace
query that would show it to you.
-- 
This message posted from opensolaris.org

Jim Klimov

2011-May-11 10:21 UTC

head link

[zfs-discuss] Performance problem suggestions?

> > Disks that have been in use for a longer time may have very fragmented
free
> > space on one hand, and not so much of it on another, but ZFS is still
trying to push
> > bits around evenly. And while it''s waiting on some disks,
others may be blocked as
> > well. Something like that...
> This could explain why performance would go up after a large delete but
I''ve not
> seen large wait times for any of my disks. The service time, percent busy,
and
> every other metric continues to show nearly idle disks.
I believe, in this situation the older fuller disks would show some activity and
others can show zero or few IOs - because ZFS has no tasks for them. It sent a
series of blocks to write from the queue, newer disks wrote them and stay
dormant, while older disks seek around to fit that piece of data... When old
disks complete the writes, ZFS batches them a new set of tasks.
> If this is the problem- it would be nice if there were a simple zfs or
dtrace query
> that would show it to you.
Well, it seems that the bridge between email and web interfaces to OpenSolaris
forums has been fixed, for new posts at least, and hopefully Richard Elling or
some other experts would come up with an idea of a dtrace for your situation.

I have little non-zero hope that the experts would also come to the web-forums
and review the past month''s posts and give their comments to my, your
and others'' questions and findings ;)

//Jim Klimov
-- 
This message posted from opensolaris.org

Jim Litchfield

2011-May-11 13:22 UTC

head link

[zfs-discuss] Performance problem suggestions?

Keep in mind zfs_vdev_max_pending. In the latest version of S10, this is set
to 10. ZFS will not issue more than the value of this variable requests at
a time for a LUN. Your disks may look relatively idle while ZFS
has a lot of data piled up inside just waiting to be read or written.
I have tweaked this on the fly. 

One key indicator is if your disk queues hover around 10.

Jim
---

----- Original Message -----
From: jimklimov at cos.ru
To: zfs-discuss at opensolaris.org
Sent: Wednesday, May 11, 2011 3:22:19 AM GMT -08:00 US/Canada Pacific
Subject: Re: [zfs-discuss] Performance problem suggestions?
> > Disks that have been in use for a longer time may have very fragmented
free
> > space on one hand, and not so much of it on another, but ZFS is still
trying to push
> > bits around evenly. And while it''s waiting on some disks,
others may be blocked as
> > well. Something like that...
> This could explain why performance would go up after a large delete but
I''ve not
> seen large wait times for any of my disks. The service time, percent busy,
and
> every other metric continues to show nearly idle disks.
I believe, in this situation the older fuller disks would show some activity and
others can show zero or few IOs - because ZFS has no tasks for them. It sent a
series of blocks to write from the queue, newer disks wrote them and stay
dormant, while older disks seek around to fit that piece of data... When old
disks complete the writes, ZFS batches them a new set of tasks.
> If this is the problem- it would be nice if there were a simple zfs or
dtrace query
> that would show it to you.
Well, it seems that the bridge between email and web interfaces to OpenSolaris
forums has been fixed, for new posts at least, and hopefully Richard Elling or
some other experts would come up with an idea of a dtrace for your situation.

I have little non-zero hope that the experts would also come to the web-forums
and review the past month''s posts and give their comments to my, your
and others'' questions and findings ;)

//Jim Klimov
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Don

2011-May-11 15:06 UTC

head link

[zfs-discuss] Performance problem suggestions?

> It sent a series of blocks to write from the queue, newer disks wrote them
and stay
> dormant, while older disks seek around to fit that piece of data... When
old disks
> complete the writes, ZFS batches them a new set of tasks.The thing is- as far as I know the OS doesn''t ask the disk to find a
place to fit the data. Instead the OS tracks what space on the disk is free and
then tells the disk where to write the data.

Even if ZFS was waiting for the IO to complete I would expect to see that delay
reflected in the disk service times. In our case we see no high service times,
no busy disks, nothing. It seems like ZFS is just sitting there quietly and
thinking to itself. If the processor were busy that might make sense but even
there- our processor seems largely idle.

At the same time- even a scrub on this system is a joke right now and
that''s a read intensive operation. I''m seeing a scrub speed of
400K/s but almost no IO''s to my disks.
-- 
This message posted from opensolaris.org

Jim Klimov

2011-May-12 10:43 UTC

head link

[zfs-discuss] Performance problem suggestions?

> The thing is- as far as I know the OS doesn''t ask the disk to find
a place
> to fit the data. Instead the OS tracks what space on the disk is free and
> then tells the disk where to write the data.
Yes and no, I did not formulate my idea clearly enough, sorry for confusion ;)

Yes - The disks don''t care about free blocks at all. For them they are
just LBA sector numbers.

No - The OS does track which sectors correlate to its logical blocks it deems
suitable for a write, and asks the disk to position its mechanical head to a
specific track and access a specific sector. This is a slow operation which can
only be done about 180-250 times per second for very random I/Os (may be more
with HDD/Controller caching, queuing and faster spindles).

I''m afraid that seeking to very dispersed metadata blocks, such as
traversing the tree during a scrub on a fragmented drive, may qualify as a very
random I/O.

This reminds me of a long-hanging "BP Rewrite" project which would
allow live re-arranging of ZFS data allowing, in particular, some extent of
defragmentation. More useful usages would be changes to RAIDZ levels and number
of disks though, maybe even removal of top-level VDEVs from a sufficiently empty
pool... Hopefully the Illumos team or some other developers would push this idea
into reality ;)

There was a good tip from Jim Litchfield regarding VDEV Queue Sizing, though.
Possible current default for zfs_vdev_max_pending is 10, which is okay (or may
be even too much) for individual drives, but is not very much for arrays of many
disks hidden behind a smart controller with its own caching and queuing, be it a
SAN box controller or a PCI one which would intercept and reinterpret your
ZFS''s calls.

So maybe this is indeed a bottleneck - which you would see in "iostat -Xn
1" as "actv" field numbers which are near the configured queue
size.

//Jim
-- 
This message posted from opensolaris.org

Don

2011-May-13 01:45 UTC

head link

[zfs-discuss] Performance problem suggestions?

> This is a slow operation which can only be done about 180-250 times per
second
> for very random I/Os (may be more with HDD/Controller caching, queuing and
> faster spindles).
> I''m afraid that seeking to very dispersed metadata blocks, such as
traversing the
> tree during a scrub on a fragmented drive, may qualify as a very random
I/O.And that''s the thing- I would understand if my scrub was slow because
the disks were just being hammered by IOPS but- all joking aside- my pool is
almost entirely idle according to an iostat -Xn
-- 
This message posted from opensolaris.org

zfs discuss - May 2011 - Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?

[zfs-discuss] Performance problem suggestions?