I''m moving some data off an old machine to something reasonably new. Normally, the new machine performs better, but I have one case just now where the new system is terribly slow. Old machine - V880 (Solaris 8) with SVM raid-5: # ptime du -kds foo 15043722 foo real 6.955 user 0.964 sys 5.492 And now the new machine - T5140 (latest Solaris 10) with ZFS striped atop a bunch of 2530 arrays: # ptime du -kds foo 15343120 foo real 2:55.210 user 2.559 sys 2:05.788 It''s not just du; a find on that directory is similarly bad. I have other filesystems of similar size and number of files (there are only about 200K files) that perform well, so there must be something about this filesystem that is throwing zfs into a spin. Anybody else seen anything like this? I''m suspicious of ACL handling. So for a quick test I took one directory with approx 5000 files in it and timed du (I''m running all this as root, btw): 1. Just the files, no ACLs. real 0.238 user 0.050 sys 0.187 2. Files with ACLs: real 0.467 user 0.055 sys 0.411 3. Files with ACLs, and an ACL on the directory real 0.610 user 0.058 sys 0.551 I don''t know whether that explains all the problem, but it''s clear that having ACLs on files and directories has a definite cost. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Hello Peter, Friday, February 13, 2009, 10:41:54 AM, you wrote: PT> I''m moving some data off an old machine to something reasonably new. PT> Normally, the new machine performs better, but I have one case just now PT> where the new system is terribly slow. PT> Old machine - V880 (Solaris 8) with SVM raid-5: PT> # ptime du -kds foo PT> 15043722 foo PT> real 6.955 PT> user 0.964 PT> sys 5.492 PT> And now the new machine - T5140 (latest Solaris 10) with ZFS striped PT> atop a bunch of 2530 arrays: PT> # ptime du -kds foo PT> 15343120 foo PT> real 2:55.210 PT> user 2.559 PT> sys 2:05.788 PT> It''s not just du; a find on that directory is similarly bad. Maybe you have some extra tuning on the old server like increased DNLC? I would check how many IOs (if any) you are doing during find (not a 1st run of course) On the other hand your find/du will depend mostly on a single thread performance and as you can see above you spending relative high percentage on CPU and your T2+ will most probably deliver less single thread performance than your V880. -- Best regards, Robert Milkowski http://milek.blogspot.com
On Sun, Feb 15, 2009 at 12:37 PM, Robert Milkowski <milek at task.gda.pl> wrote:> Hello Peter, > > Friday, February 13, 2009, 10:41:54 AM, you wrote: > > PT> I''m moving some data off an old machine to something reasonably new. > PT> Normally, the new machine performs better, but I have one case just now > PT> where the new system is terribly slow. > > PT> Old machine - V880 (Solaris 8) with SVM raid-5: > > PT> # ptime du -kds foo > PT> 15043722 foo > > PT> real 6.955 > PT> user 0.964 > PT> sys 5.492 > > PT> And now the new machine - T5140 (latest Solaris 10) with ZFS striped > PT> atop a bunch of 2530 arrays: > > PT> # ptime du -kds foo > PT> 15343120 foo > > PT> real 2:55.210 > PT> user 2.559 > PT> sys 2:05.788 > > PT> It''s not just du; a find on that directory is similarly bad. > > > Maybe you have some extra tuning on the old server like increased > DNLC? I would check how many IOs (if any) you are doing during find > (not a 1st run of course) > > On the other hand your find/du will depend mostly on a single thread > performance and as you can see above you spending relative high > percentage on CPU and your T2+ will most probably deliver less single > thread performance than your V880.I know that. But 3 minutes against 6 seconds? The thing is, it''s just this one set of data that''s slow - I''ve not noticed this performance falling off a cliff with all the other data that has been moved. (OK, there could be other datasets that have issues. But most of them don''t and this one is obiously stuck in molasses.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Hello Peter, Sunday, February 15, 2009, 12:54:40 PM, you wrote: PT> On Sun, Feb 15, 2009 at 12:37 PM, Robert Milkowski <milek at task.gda.pl> wrote:>> Hello Peter, >> >> Friday, February 13, 2009, 10:41:54 AM, you wrote: >> >> PT> I''m moving some data off an old machine to something reasonably new. >> PT> Normally, the new machine performs better, but I have one case just now >> PT> where the new system is terribly slow. >> >> PT> Old machine - V880 (Solaris 8) with SVM raid-5: >> >> PT> # ptime du -kds foo >> PT> 15043722 foo >> >> PT> real 6.955 >> PT> user 0.964 >> PT> sys 5.492 >> >> PT> And now the new machine - T5140 (latest Solaris 10) with ZFS striped >> PT> atop a bunch of 2530 arrays: >> >> PT> # ptime du -kds foo >> PT> 15343120 foo >> >> PT> real 2:55.210 >> PT> user 2.559 >> PT> sys 2:05.788 >> >> PT> It''s not just du; a find on that directory is similarly bad. >> >> >> Maybe you have some extra tuning on the old server like increased >> DNLC? I would check how many IOs (if any) you are doing during find >> (not a 1st run of course) >> >> On the other hand your find/du will depend mostly on a single thread >> performance and as you can see above you spending relative high >> percentage on CPU and your T2+ will most probably deliver less single >> thread performance than your V880.PT> I know that. But 3 minutes against 6 seconds? PT> The thing is, it''s just this one set of data that''s slow - I''ve not noticed this PT> performance falling off a cliff with all the other data that has been moved. PT> (OK, there could be other datasets that have issues. But most of them don''t PT> and this one is obiously stuck in molasses.) Well, if on old server you would have tuned DNLC so after a first pass it caches all entires while on the new one you won''t then there could be a huge difference in timing. What''s iostat -xn 1 output while doing du/find on both servers (2nd run)? The only thing that worries me is that on your new server you''re still using more than 70% of CPU so it doesn''t necessarily look like you are waiting for IOs - or it could be a combination of dnlc, single thread performance, ... The dataset you are describing may also be hitting an issue with running out of metaslabs which is using then a lot of CPU - I don''t believe it has been fixed yet. One workaround is to limit your recordsize to 8K and then copy data again. Anyway, without more details on IOs, DNLC and CPU utilization on both servers it is really hard to say what is your problem. -- Best regards, Robert Milkowski http://milek.blogspot.com