thr3ads.net - Lustre discuss - [Lustre-discuss] very slow directory access (lstat taking >20s to return) [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Frederik Ferner

2010-Oct-29 08:40 UTC

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

Hi List,

we occasionally have reports from our users that it takes a very long 
time to access files on our Lustre file system. In particular they 
complain that they can''t see the files when doing ls on the directory, 
the ls is taking very long time to return. If we wait long enough 
everything returns to normal until it happens again.

The last time this happened, we managed to do some limited debugging and 
would like to have some advice how to debug this further and hopefully 
fix the problem.

Note that this is still on Lustre 1.6.7.2-ddn3.5 but we are going to 
upgrade to 1.8 next week. Without knowing what the problem is, we are 
not convinced the upgrade will help in this instance.

Doing a ''strace -T -e file ls -n'' on one directory with about
750 files,
while users were seeing the hanging ls, showed lstat calls taking 
seconds, up to 23s.

<snip strace extract>
lstat("164435.dat", {st_mode=S_IFREG|0660, st_size=6711, ...}) = 0 
<4.899480>
getxattr("164435.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000291>
lstat("164436.dat", {st_mode=S_IFREG|0660, st_size=7939, ...}) = 0 
<13.679477>
getxattr("164436.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000045>
lstat("164443.dat", {st_mode=S_IFREG|0660, st_size=10811, ...}) = 0 
<14.126075>
getxattr("164443.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000047>
lstat("164448.dat", {st_mode=S_IFREG|0660, st_size=10804, ...}) = 0 
<12.062267>
getxattr("164448.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000045>
lstat("164450.dat", {st_mode=S_IFREG|0660, st_size=10815, ...}) = 0 
<23.535400>
getxattr("164450.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000047>
lstat("164453.dat", {st_mode=S_IFREG|0660, st_size=10809, ...}) = 0 
<8.323233>
getxattr("164453.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000043>
lstat("164462.dat", {st_mode=S_IFREG|0660, st_size=10818, ...}) = 0 
<15.436090>
getxattr("164462.dat", "system.posix_acl_access", 0x0, 0) =
100 <0.000028>
</snip strace>

During all this time there was nothing in syslog on any of the servers, 
  MDS and client both had a load of less than 0.2. The OSSes were 
busier but did not look overloaded.

llstat -i 1 /proc/fs/lustre/mds/lustre01-MDT0000/stats showed open calls 
at a rate of about 700-1200 per second:
<sample llstat output>
/proc/fs/lustre/mds/lustre01-MDT0000/stats @ 1288260173.745081
Name                      Cur.Count  Cur.Rate   #Events
open                      904        904        19632468
close                     428        428        6264157
mknod                     0          0          20090
link                      0          0          32
unlink                    9          9          162146
mkdir                     0          0          15425
rmdir                     0          0          2357
rename                    3          3          80868
getxattr                  0          0          19051954
setxattr                  0          0          22469
process_config            0          0          2
connect                   0          0          5
reconnect                 0          0          163
disconnect                0          0          4
statfs                    0          0          102566
create                    0          0          30
destroy                   0          0          30
setattr                   23         23         1864211
getattr                   487        487        58908803
llog_init                 0          0          60
notify                    0          0          121
</sample llstat output>


Kind regards,
Frederik
-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

Daniel Kobras

2010-Oct-29 13:20 UTC

head link

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

Hi Frederik!

On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner
wrote:> Doing a ''strace -T -e file ls -n'' on one directory with
about 750 files,
> while users were seeing the hanging ls, showed lstat calls taking 
> seconds, up to 23s.
The (l)stat() calls determine the exact size of all files in the displayed
directory. This means that each OSTs needs to revoke client write locks for all
these files, ie. client-side write caches for all files in the directory are
flushed before the (l)stat() returns. This can easily take several seconds if
there is heavy write activity on the file.

Regards,

Daniel.

Andreas Dilger

2010-Nov-01 04:33 UTC

head link

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

On 2010-10-29, at 21:20, Daniel Kobras wrote:> On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote:
>> Doing a ''strace -T -e file ls -n'' on one directory
with about 750 files,
>> while users were seeing the hanging ls, showed lstat calls taking 
>> seconds, up to 23s.
> 
> The (l)stat() calls determine the exact size of all files in the displayed
directory. This means that each OSTs needs to revoke client write locks for all
these files, ie. client-side write caches for all files in the directory are
flushed before the (l)stat() returns. This can easily take several seconds if
there is heavy write activity on the file.
Actually, unlike most other cluster filesystems Lustre does not need to revoke
the OST write locks in order to determine the file size.  The OST extent locks
are conditionally revoked if the client is no longer using them, but if they are
in use the clients holding those locks only return a "glimpse" of the
current file size to the OST, which in turn returns the size to the client doing
the (l)stat() call.

Since the (l)stat() call is itself not atomic (i.e. the size may be out-of-date
even before the system call returns to userspace even for local filesystems),
this glimpse behaviour is ok for (l)stat() calls.  For system calls where the
client needs to know the actual file size (e.g. open(O_APPEND) writes, or
truncate()) then the client actually does need to get the extent lock that
covers the end of the file, and of course it does so.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Frederik Ferner

2010-Nov-09 14:35 UTC

head link

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

Thanks both for your replies.

In the mean time we''ve upgraded our file system to 1.8.3-ddn3.3, which 
kept us busy, so sorry for the late reply. In our initial testing we''ve
not seen this problem but more testing is in progress.

Andreas Dilger wrote:> On 2010-10-29, at 21:20, Daniel Kobras wrote:
>> On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote:
>>> Doing a ''strace -T -e file ls -n'' on one
directory with about 750
>>> files, while users were seeing the hanging ls, showed lstat calls
>>> taking seconds, up to 23s.
>> The (l)stat() calls determine the exact size of all files in the
>> displayed directory. This means that each OSTs needs to revoke
>> client write locks for all these files, ie. client-side write
>> caches for all files in the directory are flushed before the
>> (l)stat() returns. This can easily take several seconds if there is
>> heavy write activity on the file.
As far as I was able to see at that time, the actual files were not 
written to anymore but additional files in the directory or one 
directory below were still created.
> Actually, unlike most other cluster filesystems Lustre does not need
> to revoke the OST write locks in order to determine the file size.
> The OST extent locks are conditionally revoked if the client is no
> longer using them, but if they are in use the clients holding those
> locks only return a "glimpse" of the current file size to the
OST,
> which in turn returns the size to the client doing the (l)stat()
> call.
I assume on a busy OST this could still take a while, though?

We''ll monitor the situation after the upgrade and if it happens again, 
we''ll try more debugging again.

Thanks,
Frederik

-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

Wang Yibin

2010-Nov-09 14:54 UTC

head link

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

? 2010-11-9???10:35? Frederik Ferner ???
> 
> I assume on a busy OST this could still take a while, though?
> 
Of course the load on the OST will affect the directory stats. 
I suggest you tune the maximum statahead count if there usually are a lot of
files in a normal directory:

# echo n > /proc/fs/lustre/llite/*/statahead_max

where n <= 8192.

Lustre discuss - Oct 2010 - very slow directory access (lstat taking >20s to return)

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

[Lustre-discuss] very slow directory access (lstat taking >20s to return)

[Lustre-discuss] very slow directory access (lstat taking >20s to return)