thr3ads.net - Lustre discuss - [Lustre-discuss] Poor metadata operation performance [May 2011]

If this information is useful, please help other people find it:
Share via:

Ken Hornstein

2011-May-20 13:49 UTC

[Lustre-discuss] Poor metadata operation performance

So I guess there are some things I _still_ don''t understand about
Lustre
metadata handling.  Specifically, what metadata gets stored on OSTs and
why.

What brings this all up is that a) we have users who have lots of files
and b) we recently are doing through some reorganization that requires
changing the groups on lots of these files (this is all running Lustre
1.8.4; we''re due for an upgrade in the medium future).

I figured okay, this wouldn''t be so bad, since those are all metadata
server operations.  But I started running some tests, and I found out
that chown() system calls perform poorly.

Because I was doing some previous metadata performance analysis, I took
a souce code tree which consists of approximately 50,000 files and put
two copies in one of our Lustre filesystems: one with the default striping
(across all OSTs) and one where all files have no striping at all.  The
performance between these two trees for stat() calls is large, as you
can imagine, but the disparity between the chown() calls is even larger.
You can run chgrp on all of the files in the no-striped copy in about
3-5 seconds, but the striped copy takes more than 50 seconds.

I did some more digging as to why this is.  I thought maybe at first that
this is an issue on the client, but there is code in there that skips
over talking to the OSTs for certain types of metadata updates, and turning
on debugging on the client verifies that no setattr RPCs are being sent
to the OSSes.  Looking more closely at the RPC traces reveals that the issue
is on the metadata server; the setattr RPCs simply take longer when the
files are striped.

I''ve looked at the metadata server code for a bit, and I''ve
verified
that the metadata server does send setattr RPCs to the OSSes, but I see
that it''s done asynchronously; it shouldn''t be waiting for the
replies.  So I''m stumped as to why this is happening.  I also realize
that I''m still puzzled as to what metadata is stored on the OSTs; it
seems
like the client prefers the metadata from the MDS (except of course for
size), but a fair amount of metadata is still stored on the OSSes.  Can
anyone shed some light on this?

--Ken

Andreas Dilger

2011-May-20 16:37 UTC

head link

[Lustre-discuss] Poor metadata operation performance

Ken, the OSTs need to track the ownership of objects for quota.  The more
stripes there are on a file, the more RPCs that need to be sent, which is why we
don''t recommend wide striping unless there is a reason for it
(bandwidth, size, etc).

Cheers, Andreas

On 2011-05-20, at 7:49 AM, Ken Hornstein <kenh at cmf.nrl.navy.mil> wrote:
> So I guess there are some things I _still_ don''t understand about
Lustre
> metadata handling.  Specifically, what metadata gets stored on OSTs and
> why.
> 
> What brings this all up is that a) we have users who have lots of files
> and b) we recently are doing through some reorganization that requires
> changing the groups on lots of these files (this is all running Lustre
> 1.8.4; we''re due for an upgrade in the medium future).
> 
> I figured okay, this wouldn''t be so bad, since those are all
metadata
> server operations.  But I started running some tests, and I found out
> that chown() system calls perform poorly.
> 
> Because I was doing some previous metadata performance analysis, I took
> a souce code tree which consists of approximately 50,000 files and put
> two copies in one of our Lustre filesystems: one with the default striping
> (across all OSTs) and one where all files have no striping at all.  The
> performance between these two trees for stat() calls is large, as you
> can imagine, but the disparity between the chown() calls is even larger.
> You can run chgrp on all of the files in the no-striped copy in about
> 3-5 seconds, but the striped copy takes more than 50 seconds.
> 
> I did some more digging as to why this is.  I thought maybe at first that
> this is an issue on the client, but there is code in there that skips
> over talking to the OSTs for certain types of metadata updates, and turning
> on debugging on the client verifies that no setattr RPCs are being sent
> to the OSSes.  Looking more closely at the RPC traces reveals that the
issue
> is on the metadata server; the setattr RPCs simply take longer when the
> files are striped.
> 
> I''ve looked at the metadata server code for a bit, and
I''ve verified
> that the metadata server does send setattr RPCs to the OSSes, but I see
> that it''s done asynchronously; it shouldn''t be waiting
for the
> replies.  So I''m stumped as to why this is happening.  I also
realize
> that I''m still puzzled as to what metadata is stored on the OSTs;
it seems
> like the client prefers the metadata from the MDS (except of course for
> size), but a fair amount of metadata is still stored on the OSSes.  Can
> anyone shed some light on this?
> 
> --Ken
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Ken Hornstein

2011-May-20 16:47 UTC

head link

[Lustre-discuss] Poor metadata operation performance

>Ken, the OSTs need to track the ownership of objects for quota.  The more
>stripes there are on a file, the more RPCs that need to be sent, which is
why
>we don''t recommend wide striping unless there is a reason for it
(bandwidth,
>size, etc).
Fair enough; I always forget about quota accounting, because we never use
it.  But I''m wondering why this in particular causes such a hit,
because
the MDS sends the setattr RPCs asynchronously; in theory it should just
fire them off and not have to wait until they''re done.  Perhaps
it''s the
overhead of sending those RPCs which is slowing things down?  I could believe
that, although I would have thought that it wouldn''t be that bad.

--Ken

Andreas Dilger

2011-May-20 19:06 UTC

head link

[Lustre-discuss] Poor metadata operation performance

It would be interesting to find out what is causing the bottleneck. At one time
there was no throttle on the number of RPCs that the MDS could send, which
caused overload problems on the OSTs.

Now, the MDS is limited by the normal rpcs_in_flight tunable (=8) that clients
are limited to. It would be worthwhile to see if increasing this helped the
overall performance? If yes, then it would make sense to tune the OSCs on the
MDS for more RPCs by default.

Cheers, Andreas

On 2011-05-20, at 10:47 AM, Ken Hornstein <kenh at cmf.nrl.navy.mil>
wrote:
>> Ken, the OSTs need to track the ownership of objects for quota.  The
more
>> stripes there are on a file, the more RPCs that need to be sent, which
is why
>> we don''t recommend wide striping unless there is a reason for
it (bandwidth,
>> size, etc).
> 
> Fair enough; I always forget about quota accounting, because we never use
> it.  But I''m wondering why this in particular causes such a hit,
because
> the MDS sends the setattr RPCs asynchronously; in theory it should just
> fire them off and not have to wait until they''re done.  Perhaps
it''s the
> overhead of sending those RPCs which is slowing things down?  I could
believe
> that, although I would have thought that it wouldn''t be that bad.
> 
> --Ken
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - May 2011 - Poor metadata operation performance

[Lustre-discuss] Poor metadata operation performance

[Lustre-discuss] Poor metadata operation performance

[Lustre-discuss] Poor metadata operation performance

[Lustre-discuss] Poor metadata operation performance