Frederik Ferner
2010-Oct-29 08:40 UTC
[Lustre-discuss] very slow directory access (lstat taking >20s to return)
Hi List, we occasionally have reports from our users that it takes a very long time to access files on our Lustre file system. In particular they complain that they can''t see the files when doing ls on the directory, the ls is taking very long time to return. If we wait long enough everything returns to normal until it happens again. The last time this happened, we managed to do some limited debugging and would like to have some advice how to debug this further and hopefully fix the problem. Note that this is still on Lustre 1.6.7.2-ddn3.5 but we are going to upgrade to 1.8 next week. Without knowing what the problem is, we are not convinced the upgrade will help in this instance. Doing a ''strace -T -e file ls -n'' on one directory with about 750 files, while users were seeing the hanging ls, showed lstat calls taking seconds, up to 23s. <snip strace extract> lstat("164435.dat", {st_mode=S_IFREG|0660, st_size=6711, ...}) = 0 <4.899480> getxattr("164435.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000291> lstat("164436.dat", {st_mode=S_IFREG|0660, st_size=7939, ...}) = 0 <13.679477> getxattr("164436.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000045> lstat("164443.dat", {st_mode=S_IFREG|0660, st_size=10811, ...}) = 0 <14.126075> getxattr("164443.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000047> lstat("164448.dat", {st_mode=S_IFREG|0660, st_size=10804, ...}) = 0 <12.062267> getxattr("164448.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000045> lstat("164450.dat", {st_mode=S_IFREG|0660, st_size=10815, ...}) = 0 <23.535400> getxattr("164450.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000047> lstat("164453.dat", {st_mode=S_IFREG|0660, st_size=10809, ...}) = 0 <8.323233> getxattr("164453.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000043> lstat("164462.dat", {st_mode=S_IFREG|0660, st_size=10818, ...}) = 0 <15.436090> getxattr("164462.dat", "system.posix_acl_access", 0x0, 0) = 100 <0.000028> </snip strace> During all this time there was nothing in syslog on any of the servers, MDS and client both had a load of less than 0.2. The OSSes were busier but did not look overloaded. llstat -i 1 /proc/fs/lustre/mds/lustre01-MDT0000/stats showed open calls at a rate of about 700-1200 per second: <sample llstat output> /proc/fs/lustre/mds/lustre01-MDT0000/stats @ 1288260173.745081 Name Cur.Count Cur.Rate #Events open 904 904 19632468 close 428 428 6264157 mknod 0 0 20090 link 0 0 32 unlink 9 9 162146 mkdir 0 0 15425 rmdir 0 0 2357 rename 3 3 80868 getxattr 0 0 19051954 setxattr 0 0 22469 process_config 0 0 2 connect 0 0 5 reconnect 0 0 163 disconnect 0 0 4 statfs 0 0 102566 create 0 0 30 destroy 0 0 30 setattr 23 23 1864211 getattr 487 487 58908803 llog_init 0 0 60 notify 0 0 121 </sample llstat output> Kind regards, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.)
Daniel Kobras
2010-Oct-29 13:20 UTC
[Lustre-discuss] very slow directory access (lstat taking >20s to return)
Hi Frederik! On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote:> Doing a ''strace -T -e file ls -n'' on one directory with about 750 files, > while users were seeing the hanging ls, showed lstat calls taking > seconds, up to 23s.The (l)stat() calls determine the exact size of all files in the displayed directory. This means that each OSTs needs to revoke client write locks for all these files, ie. client-side write caches for all files in the directory are flushed before the (l)stat() returns. This can easily take several seconds if there is heavy write activity on the file. Regards, Daniel.
Andreas Dilger
2010-Nov-01 04:33 UTC
[Lustre-discuss] very slow directory access (lstat taking >20s to return)
On 2010-10-29, at 21:20, Daniel Kobras wrote:> On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote: >> Doing a ''strace -T -e file ls -n'' on one directory with about 750 files, >> while users were seeing the hanging ls, showed lstat calls taking >> seconds, up to 23s. > > The (l)stat() calls determine the exact size of all files in the displayed directory. This means that each OSTs needs to revoke client write locks for all these files, ie. client-side write caches for all files in the directory are flushed before the (l)stat() returns. This can easily take several seconds if there is heavy write activity on the file.Actually, unlike most other cluster filesystems Lustre does not need to revoke the OST write locks in order to determine the file size. The OST extent locks are conditionally revoked if the client is no longer using them, but if they are in use the clients holding those locks only return a "glimpse" of the current file size to the OST, which in turn returns the size to the client doing the (l)stat() call. Since the (l)stat() call is itself not atomic (i.e. the size may be out-of-date even before the system call returns to userspace even for local filesystems), this glimpse behaviour is ok for (l)stat() calls. For system calls where the client needs to know the actual file size (e.g. open(O_APPEND) writes, or truncate()) then the client actually does need to get the extent lock that covers the end of the file, and of course it does so. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Frederik Ferner
2010-Nov-09 14:35 UTC
[Lustre-discuss] very slow directory access (lstat taking >20s to return)
Thanks both for your replies. In the mean time we''ve upgraded our file system to 1.8.3-ddn3.3, which kept us busy, so sorry for the late reply. In our initial testing we''ve not seen this problem but more testing is in progress. Andreas Dilger wrote:> On 2010-10-29, at 21:20, Daniel Kobras wrote: >> On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote: >>> Doing a ''strace -T -e file ls -n'' on one directory with about 750 >>> files, while users were seeing the hanging ls, showed lstat calls >>> taking seconds, up to 23s. >> The (l)stat() calls determine the exact size of all files in the >> displayed directory. This means that each OSTs needs to revoke >> client write locks for all these files, ie. client-side write >> caches for all files in the directory are flushed before the >> (l)stat() returns. This can easily take several seconds if there is >> heavy write activity on the file.As far as I was able to see at that time, the actual files were not written to anymore but additional files in the directory or one directory below were still created.> Actually, unlike most other cluster filesystems Lustre does not need > to revoke the OST write locks in order to determine the file size. > The OST extent locks are conditionally revoked if the client is no > longer using them, but if they are in use the clients holding those > locks only return a "glimpse" of the current file size to the OST, > which in turn returns the size to the client doing the (l)stat() > call.I assume on a busy OST this could still take a while, though? We''ll monitor the situation after the upgrade and if it happens again, we''ll try more debugging again. Thanks, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.)
Wang Yibin
2010-Nov-09 14:54 UTC
[Lustre-discuss] very slow directory access (lstat taking >20s to return)
? 2010-11-9???10:35? Frederik Ferner ???> > I assume on a busy OST this could still take a while, though? >Of course the load on the OST will affect the directory stats. I suggest you tune the maximum statahead count if there usually are a lot of files in a normal directory: # echo n > /proc/fs/lustre/llite/*/statahead_max where n <= 8192.