thr3ads.net - zfs discuss - [zfs-discuss] Strange performance loss [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Peter Tribble

2009-Feb-13 10:41 UTC

[zfs-discuss] Strange performance loss

I''m moving some data off an old machine to something reasonably new.
Normally, the new machine performs better, but I have one case just now
where the new system is terribly slow.

Old machine - V880 (Solaris 8) with SVM raid-5:

# ptime du -kds foo
15043722        foo

real        6.955
user        0.964
sys         5.492

And now the new machine - T5140 (latest Solaris 10) with ZFS striped
atop a bunch of 2530 arrays:

# ptime du -kds foo
15343120        foo

real     2:55.210
user        2.559
sys      2:05.788

It''s not just du; a find on that directory is similarly bad.

I have other filesystems of similar size and number of files (there are only
about 200K files) that perform well, so there must be something about this
filesystem that is throwing zfs into a spin.

Anybody else seen anything like this?

I''m suspicious of ACL handling. So for a quick test I took one
directory with
approx 5000 files in it and timed du (I''m running all this as root,
btw):

1. Just the files, no ACLs.

real        0.238
user        0.050
sys         0.187

2. Files with ACLs:

real        0.467
user        0.055
sys         0.411

3.  Files with ACLs, and an ACL on the directory

real        0.610
user        0.058
sys         0.551

I don''t know whether that explains all the problem, but it''s
clear
that having ACLs
on files and directories has a definite cost.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Robert Milkowski

2009-Feb-15 12:37 UTC

head link

[zfs-discuss] Strange performance loss

Hello Peter,

Friday, February 13, 2009, 10:41:54 AM, you wrote:

PT> I''m moving some data off an old machine to something reasonably
new.
PT> Normally, the new machine performs better, but I have one case just now
PT> where the new system is terribly slow.

PT> Old machine - V880 (Solaris 8) with SVM raid-5:

PT> # ptime du -kds foo
PT> 15043722        foo

PT> real        6.955
PT> user        0.964
PT> sys         5.492

PT> And now the new machine - T5140 (latest Solaris 10) with ZFS striped
PT> atop a bunch of 2530 arrays:

PT> # ptime du -kds foo
PT> 15343120        foo

PT> real     2:55.210
PT> user        2.559
PT> sys      2:05.788

PT> It''s not just du; a find on that directory is similarly bad.


Maybe you have some extra tuning on the old server like increased
DNLC? I would check how many IOs (if any) you are doing during find
(not a 1st run of course)

On the other hand your find/du will depend mostly on a single thread
performance and as you can see above you spending relative high
percentage on CPU and your T2+ will most probably deliver less single
thread performance than your V880.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Peter Tribble

2009-Feb-15 12:54 UTC

head link

[zfs-discuss] Strange performance loss

On Sun, Feb 15, 2009 at 12:37 PM, Robert Milkowski <milek at task.gda.pl>
wrote:> Hello Peter,
>
> Friday, February 13, 2009, 10:41:54 AM, you wrote:
>
> PT> I''m moving some data off an old machine to something
reasonably new.
> PT> Normally, the new machine performs better, but I have one case just
now
> PT> where the new system is terribly slow.
>
> PT> Old machine - V880 (Solaris 8) with SVM raid-5:
>
> PT> # ptime du -kds foo
> PT> 15043722        foo
>
> PT> real        6.955
> PT> user        0.964
> PT> sys         5.492
>
> PT> And now the new machine - T5140 (latest Solaris 10) with ZFS striped
> PT> atop a bunch of 2530 arrays:
>
> PT> # ptime du -kds foo
> PT> 15343120        foo
>
> PT> real     2:55.210
> PT> user        2.559
> PT> sys      2:05.788
>
> PT> It''s not just du; a find on that directory is similarly
bad.
>
>
> Maybe you have some extra tuning on the old server like increased
> DNLC? I would check how many IOs (if any) you are doing during find
> (not a 1st run of course)
>
> On the other hand your find/du will depend mostly on a single thread
> performance and as you can see above you spending relative high
> percentage on CPU and your T2+ will most probably deliver less single
> thread performance than your V880.
I know that. But 3 minutes against 6 seconds?

The thing is, it''s just this one set of data that''s slow -
I''ve not noticed this
performance falling off a cliff with all the other data that has been moved.

(OK, there could be other datasets that have issues. But most of them
don''t
and this one is obiously stuck in molasses.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Robert Milkowski

2009-Feb-15 13:18 UTC

head link

[zfs-discuss] Strange performance loss

Hello Peter,

Sunday, February 15, 2009, 12:54:40 PM, you wrote:

PT> On Sun, Feb 15, 2009 at 12:37 PM, Robert Milkowski <milek at
task.gda.pl> wrote:>> Hello Peter,
>>
>> Friday, February 13, 2009, 10:41:54 AM, you wrote:
>>
>> PT> I''m moving some data off an old machine to something
reasonably new.
>> PT> Normally, the new machine performs better, but I have one case
just now
>> PT> where the new system is terribly slow.
>>
>> PT> Old machine - V880 (Solaris 8) with SVM raid-5:
>>
>> PT> # ptime du -kds foo
>> PT> 15043722        foo
>>
>> PT> real        6.955
>> PT> user        0.964
>> PT> sys         5.492
>>
>> PT> And now the new machine - T5140 (latest Solaris 10) with ZFS
striped
>> PT> atop a bunch of 2530 arrays:
>>
>> PT> # ptime du -kds foo
>> PT> 15343120        foo
>>
>> PT> real     2:55.210
>> PT> user        2.559
>> PT> sys      2:05.788
>>
>> PT> It''s not just du; a find on that directory is similarly
bad.
>>
>>
>> Maybe you have some extra tuning on the old server like increased
>> DNLC? I would check how many IOs (if any) you are doing during find
>> (not a 1st run of course)
>>
>> On the other hand your find/du will depend mostly on a single thread
>> performance and as you can see above you spending relative high
>> percentage on CPU and your T2+ will most probably deliver less single
>> thread performance than your V880.
PT> I know that. But 3 minutes against 6 seconds?

PT> The thing is, it''s just this one set of data that''s
slow - I''ve not noticed this
PT> performance falling off a cliff with all the other data that has been
moved.

PT> (OK, there could be other datasets that have issues. But most of them
don''t
PT> and this one is obiously stuck in molasses.)

Well, if on old server you would have tuned DNLC so after a first pass
it caches all entires while on the new one you won''t then there could
be a huge difference in timing. What''s iostat -xn 1 output while doing
du/find on both servers (2nd run)? The only thing that worries me is
that on your new server you''re still using more than 70% of CPU so it
doesn''t necessarily look like you are waiting for IOs - or it could be
a combination of dnlc, single thread performance, ...

The dataset you are describing may also be hitting an issue with
running out of metaslabs which is using then a lot of CPU - I don''t
believe it has been fixed yet. One workaround is to limit your
recordsize to 8K and then copy data again.

Anyway, without more details on IOs, DNLC and CPU utilization on both
servers it is really hard to say what is your problem.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Apparently Analagous Threads

Search for more reasonably related threads

zfs discuss - Feb 2009 - Strange performance loss

[zfs-discuss] Strange performance loss

[zfs-discuss] Strange performance loss

[zfs-discuss] Strange performance loss

[zfs-discuss] Strange performance loss

Apparently Analagous Threads