thr3ads.net - zfs discuss - [zfs-discuss] Extremely Slow ZFS Performance [May 2011]

If this information is useful, please help other people find it:
Share via:

Adam Serediuk

2011-May-04 19:21 UTC

[zfs-discuss] Extremely Slow ZFS Performance

We have an X4540 running Solaris 11 Express snv_151a that has developed an issue
where its write performance is absolutely abysmal. Even touching a file takes
over five seconds both locally and remotely.

/pool1/data# time touch foo

real    0m5.305s
user    0m0.001s
sys     0m0.004s
/pool1/data# time rm foo

real    0m5.912s
user    0m0.001s
sys     0m0.005s

The system exhibits this issue under the slightest load.  We have sync=disabled
set on all filesystems in this pool. The pool is at 75% capacity and is healthy.
This issue started suddenly several days ago and persists after reboot. prstat
shows zpool-pool1/150 taking 10% CPU constantly whereas other similar systems in
our infrastructure under the same load do not. Even doing a ''zfs
set'' on a property takes up to 10 seconds and on other systems is
instantaneous. Something appears to be blocking internally.

Both iostat and zpool iostat show very little to zero load on the devices even
while blocking.

Any suggestions on avenues of approach for troubleshooting?

Thanks,

Adam

Eric D. Mudama

2011-May-04 19:55 UTC

head link

[zfs-discuss] Extremely Slow ZFS Performance

On Wed, May  4 at 12:21, Adam Serediuk wrote:>Both iostat and zpool iostat show very little to zero load on the devices
even while blocking.
>
>Any suggestions on avenues of approach for troubleshooting?
is ''iostat -en'' error free?


-- 
Eric D. Mudama
edmudama at bounceswoosh.org

Adam Serediuk

2011-May-04 20:09 UTC

head link

[zfs-discuss] Extremely Slow ZFS Performance

iostat doesn''t show any high service times and fsstat also shows low
throughput. Occasionally I can generate enough load that you do see some very
high asvc_t but when that occurs the pool is performing as expected. As a
precaution I just added two extra drives to the zpool incase zfs was having
difficulty finding a location to allocate new blocks and it has made no
difference. You can still see still see block allocation cycling evenly across
the available mirrors.



iostat -xnz 3 before, during, and after the blocking
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   33.3    0.0  184.1  0.0  0.1    0.0    1.5   0   1 c12t0d0
    0.0   15.3    0.0   98.7  0.0  0.0    0.0    1.0   0   0 c12t1d0
    0.0   15.3    0.0   98.7  0.0  0.0    0.0    1.1   0   0 c12t2d0
    0.0   14.3    0.0   98.7  0.0  0.0    0.0    1.0   0   0 c12t6d0
    0.0   20.7    0.0  256.6  0.0  0.0    0.0    1.4   0   0 c11t5d0
    0.0   48.0    0.0  273.4  0.0  0.1    0.0    1.3   0   1 c11t6d0
    0.0   34.0    0.0  199.9  0.0  0.0    0.0    1.5   0   1 c11t7d0
    0.0   20.3    0.0  256.6  0.0  0.0    0.0    1.4   0   0 c10t5d0
    0.0   47.7    0.0  273.4  0.0  0.1    0.0    1.2   0   1 c10t6d0
    0.0   34.0    0.0  199.9  0.0  0.0    0.0    1.5   0   1 c10t7d0
    0.0    9.7    0.0 1237.9  0.0  0.0    0.0    4.7   0   1 c14t0d0
    0.0   33.3    0.0  184.1  0.0  0.1    0.0    1.8   0   1 c13t0d0
    0.0   15.7    0.0   98.7  0.0  0.0    0.0    1.2   0   0 c13t1d0
    0.0   15.7    0.0   98.7  0.0  0.0    0.0    1.0   0   0 c13t2d0
    0.0   13.7    0.0   98.7  0.0  0.0    0.0    1.1   0   0 c13t6d0

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t1d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t2d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t3d0
    0.3    0.3    1.7    0.0  0.0  0.0    0.0    6.0   0   0 c8t4d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t5d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t6d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t7d0
    0.0    2.0    0.0    1.3  0.0  0.0    0.0    0.2   0   0 c12t0d0
    0.0   15.7    0.0  235.9  0.0  0.0    0.0    0.8   0   0 c12t1d0
    0.3   16.3    1.8  235.9  0.0  0.0    0.0    1.2   0   1 c12t2d0
    0.3   15.0    2.8  130.6  0.0  0.0    0.0    1.0   0   1 c12t3d0
    0.0   11.7    0.0  127.9  0.0  0.0    0.0    0.8   0   0 c12t4d0
    0.3    0.3    0.2    0.0  0.0  0.0    0.0    9.0   0   1 c12t5d0
    0.0   40.3    0.0  599.9  0.0  0.0    0.0    0.8   0   1 c12t6d0
    0.3    0.3    2.8    0.0  0.0  0.0    0.0    3.2   0   0 c11t0d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c11t1d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c11t2d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c11t3d0
    0.7    0.3    3.0    0.0  0.0  0.0    0.0    6.5   0   1 c11t4d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c11t5d0
    0.3    2.0    1.5    1.3  0.0  0.0    0.0    2.8   0   1 c11t6d0
    0.0    2.0    0.0    1.3  0.0  0.0    0.0    0.2   0   0 c11t7d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c10t0d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c10t1d0
    0.3    0.3    0.2    0.0  0.0  0.0    0.0    6.7   0   0 c10t2d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c10t3d0
    0.3    0.3    1.5    0.0  0.0  0.0    0.0    9.1   0   1 c10t4d0
    0.3    0.3    1.5    0.0  0.0  0.0    0.0    7.7   0   1 c10t5d0
    0.3    2.0    1.5    1.3  0.0  0.0    0.0    2.2   0   0 c10t6d0
    0.0    2.0    0.0    1.3  0.0  0.0    0.0    0.2   0   0 c10t7d0
    0.0    0.3    0.0   42.6  0.0  0.0    0.0    0.7   0   0 c14t1d0
    0.3    0.3    1.5    0.0  0.0  0.0    0.0    5.9   0   0 c9t1d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t2d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t3d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t4d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t5d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t6d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t7d0
    0.0    2.0    0.0    1.3  0.0  0.0    0.0    0.4   0   0 c13t0d0
    0.0   15.7    0.0  235.9  0.0  0.0    0.0    0.8   0   0 c13t1d0
    0.0   16.3    0.0  235.9  0.0  0.0    0.0    0.9   0   0 c13t2d0
    0.0   14.7    0.0  130.6  0.0  0.0    0.0    0.7   0   0 c13t3d0
    0.0   12.0    0.0  127.9  0.0  0.0    0.0    0.9   0   0 c13t4d0
    0.0    0.3    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c13t5d0
    0.0   39.6    0.0  599.9  0.0  0.0    0.0    0.8   0   1 c13t6d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   12.0    0.0   45.2  0.0  0.0    0.0    0.4   0   0 c8t1d0
    0.0   32.3    0.0  175.7  0.0  0.1    0.0    2.4   0   1 c8t2d0
    0.0   34.3    0.0  175.7  0.0  0.1    0.0    3.1   0   1 c8t3d0
    0.0   32.3    0.0  175.7  0.0  0.1    0.0    2.6   0   1 c8t4d0
    0.0   24.3    0.0  203.8  0.0  0.0    0.0    1.0   0   1 c12t4d0
    0.0   23.3    0.0  142.0  0.0  0.0    0.0    0.8   0   0 c12t5d0
    0.0   12.3    0.0   45.2  0.0  0.0    0.0    0.4   0   0 c9t1d0
    0.0   31.3    0.0  175.7  0.0  0.1    0.0    2.2   0   1 c9t2d0
    0.0   33.3    0.0  175.7  0.0  0.1    0.0    2.2   0   1 c9t3d0
    0.0   31.7    0.0  175.7  0.0  0.1    0.0    2.5   0   1 c9t4d0
    0.0   24.0    0.0  203.8  0.0  0.0    0.0    0.9   0   1 c13t4d0
    0.0   23.0    0.0  142.0  0.0  0.0    0.0    0.8   0   0 c13t5d0
                    extended device statistics              
> 
> is ''iostat -en'' error free?
it is error free for all devices

On May 4, 2011, at 12:28 PM, Michael Schuster wrote:
> On Wed, May 4, 2011 at 21:21, Adam Serediuk <aserediuk at gmail.com>
wrote:
>> We have an X4540 running Solaris 11 Express snv_151a that has developed
an issue where its write performance is absolutely abysmal. Even touching a file
takes over five seconds both locally and remotely.
>> 
>> /pool1/data# time touch foo
>> 
>> real    0m5.305s
>> user    0m0.001s
>> sys     0m0.004s
>> /pool1/data# time rm foo
>> 
>> real    0m5.912s
>> user    0m0.001s
>> sys     0m0.005s
>> 
>> The system exhibits this issue under the slightest load.  We have
sync=disabled set on all filesystems in this pool. The pool is at 75% capacity
and is healthy. This issue started suddenly several days ago and persists after
reboot. prstat shows zpool-pool1/150 taking 10% CPU constantly whereas other
similar systems in our infrastructure under the same load do not. Even doing a
''zfs set'' on a property takes up to 10 seconds and on other
systems is instantaneous. Something appears to be blocking internally.
>> 
>> Both iostat and zpool iostat show very little to zero load on the
devices even while blocking.
> 
> it''d be interesting to see a completet iostat run while this is
> happening - I''d suggest ''iostat -xnz 3'' or sth
like that. look for
> high svc_t values ...
> 
> HTH
> Michael
> -- 
> regards/mit freundlichen Gr?ssen
> Michael Schuster

Adam Serediuk

2011-May-04 22:40 UTC

head link

[zfs-discuss] Extremely Slow ZFS Performance

Dedup is disabled (confirmed to be.) Doing some digging it looks like this is a
very similar issue to
http://forums.oracle.com/forums/thread.jspa?threadID=2200577&tstart=0.


On May 4, 2011, at 2:26 PM, Garrett D''Amore wrote:
> My first thought is dedup... perhaps you''ve got dedup enabled and
the DDT no longer fits in RAM?  That would create a huge performance cliff.
> 
> -----Original Message-----
> From: zfs-discuss-bounces at opensolaris.org on behalf of Eric D. Mudama
> Sent: Wed 5/4/2011 12:55 PM
> To: Adam Serediuk
> Cc: zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
> 
> On Wed, May  4 at 12:21, Adam Serediuk wrote:
> >Both iostat and zpool iostat show very little to zero load on the
devices even while blocking.
> >
> >Any suggestions on avenues of approach for troubleshooting?
> 
> is ''iostat -en'' error free?
> 
> 
> --
> Eric D. Mudama
> edmudama at bounceswoosh.org
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110504/404e5819/attachment.html>

Garrett D''Amore

2011-May-04 23:01 UTC

head link

[zfs-discuss] Extremely Slow ZFS Performance

Sounds like a nasty bug, and not one I''ve seen in illumos or
NexentaStor.  What build are you running?

	- Garrett

On Wed, 2011-05-04 at 15:40 -0700, Adam Serediuk wrote:> Dedup is disabled (confirmed to be.) Doing some digging it looks like
> this is a very similar issue
> to
http://forums.oracle.com/forums/thread.jspa?threadID=2200577&tstart=0.
> 
> 
> 
> On May 4, 2011, at 2:26 PM, Garrett D''Amore wrote:
> 
> > My first thought is dedup... perhaps you''ve got dedup enabled
and
> > the DDT no longer fits in RAM?  That would create a huge performance
> > cliff.
> > 
> > -----Original Message-----
> > From: zfs-discuss-bounces at opensolaris.org on behalf of Eric D.
> > Mudama
> > Sent: Wed 5/4/2011 12:55 PM
> > To: Adam Serediuk
> > Cc: zfs-discuss at opensolaris.org
> > Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
> > 
> > On Wed, May  4 at 12:21, Adam Serediuk wrote:
> > >Both iostat and zpool iostat show very little to zero load on the
> > devices even while blocking.
> > >
> > >Any suggestions on avenues of approach for troubleshooting?
> > 
> > is ''iostat -en'' error free?
> > 
> > 
> > --
> > Eric D. Mudama
> > edmudama at bounceswoosh.org
> > 
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > 
> > 
> > 
> > 
> 
>

Adam Serediuk

2011-May-04 23:22 UTC

head link

[zfs-discuss] Extremely Slow ZFS Performance

On May 4, 2011, at 4:16 PM, Victor Latushkin wrote:
> Try
> 
> echo metaslab_debug/W1 | mdb -kw
> 
> If it does not help, reset it back to zero 
> 
> echo metaslab_debug/W0 | mdb -kw
That appears to have resolved the issue! Within seconds of making the change
performance has increased by an order of magnitude. I was typing the reply below
when your message came in. Is this bug 7000208?

On May 4, 2011, at 4:01 PM, Garrett D''Amore wrote:
> Sounds like a nasty bug, and not one I''ve seen in illumos or
> NexentaStor.  What build are you running?

running snv_151a

Running some synthetic tests right now and comparing the various stats, one
thing that stands out as very different on this system compared to our others is
that writes seem to be going to ~5 mirror sets at a time (of 22 configured.) The
next batch of writes will move on to the next ~5 mirror sets, and so forth
cycling around. The other systems will write to many more mirror sets
simultaneously. This particular machine does not appear to be buffering writes
and appears to be doing everything sync to disk despite having sync/zil
disabled.

I''m trying to do a little more introspection into the zpool thread that
is using cpu but not having much luck finding anything meaningful. Occasionally
the cpu usage for that thread will drop, and when it does performance of the
filesystem increases.

> On Wed, 2011-05-04 at 15:40 -0700, Adam Serediuk wrote:
>> Dedup is disabled (confirmed to be.) Doing some digging it looks like
>> this is a very similar issue
>> to
http://forums.oracle.com/forums/thread.jspa?threadID=2200577&tstart=0.
>> 
>> 
>> 
>> On May 4, 2011, at 2:26 PM, Garrett D''Amore wrote:
>> 
>>> My first thought is dedup... perhaps you''ve got dedup
enabled and
>>> the DDT no longer fits in RAM?  That would create a huge
performance
>>> cliff.
>>> 
>>> -----Original Message-----
>>> From: zfs-discuss-bounces at opensolaris.org on behalf of Eric D.
>>> Mudama
>>> Sent: Wed 5/4/2011 12:55 PM
>>> To: Adam Serediuk
>>> Cc: zfs-discuss at opensolaris.org
>>> Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
>>> 
>>> On Wed, May  4 at 12:21, Adam Serediuk wrote:
>>>> Both iostat and zpool iostat show very little to zero load on
the
>>> devices even while blocking.
>>>> 
>>>> Any suggestions on avenues of approach for troubleshooting?
>>> 
>>> is ''iostat -en'' error free?
>>> 
>>> 
>>> --
>>> Eric D. Mudama
>>> edmudama at bounceswoosh.org
>>> 
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
>

zfs discuss - May 2011 - Extremely Slow ZFS Performance

[zfs-discuss] Extremely Slow ZFS Performance

[zfs-discuss] Extremely Slow ZFS Performance

[zfs-discuss] Extremely Slow ZFS Performance

[zfs-discuss] Extremely Slow ZFS Performance

[zfs-discuss] Extremely Slow ZFS Performance

[zfs-discuss] Extremely Slow ZFS Performance