Hello, We''ve started getting complaints from users about exceedingly slow lstats on directories of any size. The back-end storage is not overloaded (MDT is all but completely unloaded, the OSTs aren''t doing much beyond 500MB/s (aggregate, DDN 9900s). The load on the OSSs is also inexplicably high (consistently above 50). Strace is showing the majority of time being spent on failing getxattrs. We''ve alleviated much of the impact of this slowness by unaliasing color ls''s, but that''s merely a work-around. Any ideas at first glance? 0.001222 lstat("/global/scratch/jwhite/backuptest/highcount/3/a2", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.664400 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a2", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000040 lstat("/global/scratch/jwhite/backuptest/highcount/3/a5", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.916908 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a5", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000040 lstat("/global/scratch/jwhite/backuptest/highcount/3/a10", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.387650 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a10", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000040 lstat("/global/scratch/jwhite/backuptest/highcount/3/a11", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.376490 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a11", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000049 lstat("/global/scratch/jwhite/backuptest/highcount/3/a22", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 1.134287 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a22", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000045 lstat("/global/scratch/jwhite/backuptest/highcount/3/a28", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.540547 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a28", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000046 lstat("/global/scratch/jwhite/backuptest/highcount/3/a29", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.572334 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a29", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000040 lstat("/global/scratch/jwhite/backuptest/highcount/3/a36", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 0.392290 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a36", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 Lawrence Berkeley National Lab Berkeley, CA 94720
Well, I was reading the strace wrong anyway: lstat("../403/a323", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.134326> getxattr("../403/a323", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000018> lstat("../403/a330", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.158898> getxattr("../403/a330", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000019> lstat("../403/a331", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.239466> getxattr("../403/a331", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000012> lstat("../403/a332", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.130146> getxattr("../403/a332", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000012> The getxattr takes an incredibly short amount of time, it''s the lstat itself that''s taking 0.1+s. ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 Lawrence Berkeley National Lab Berkeley, CA 94720 On Jan 20, 2012, at 2:28 PM, Mark Hahn wrote:>> I''m sorry, I''m not quite understanding what you''re asking. I don''t have >> ACLs specifically enabled anywhere (and would expect the default is >> disabled). > > I guess what I was suggesting is that you could try a simple experiment: > mount a client requesting the acl mount option. (I don''t know whether the "mount -oremount,acl ..." trick will work with Lustre.) > > if the problem goes away, you''re done. > > you also mentioned OSS load - I don''t see how that could be related, > since OSSs are not involved in metadata operations like lstat or getxattr. > (though depending on Lustre version, they could be involved in fetching > actual size of files, which is especially salient on striped files...) > > > >> ---------------- >> John White >> HPC Systems Engineer >> (510) 486-7307 >> One Cyclotron Rd, MS: 50C-3396 >> Lawrence Berkeley National Lab >> Berkeley, CA 94720 >> >> On Jan 20, 2012, at 12:49 PM, Mark Hahn wrote: >> >>>> 0.916908 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a5", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) >>> >>> are your clients mounting with the acl option >>> and acl isn''t missing on the mds mount? >> > > -- > operator may differ from spokesperson. hahn at mcmaster.ca
On 1/21/12 6:35 AM, John White wrote:> Well, I was reading the strace wrong anyway: > lstat("../403/a323", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.134326> > getxattr("../403/a323", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000018> > lstat("../403/a330", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.158898> > getxattr("../403/a330", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000019> > lstat("../403/a331", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.239466> > getxattr("../403/a331", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> > lstat("../403/a332", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.130146> > getxattr("../403/a332", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> > > The getxattr takes an incredibly short amount of time, it''s the lstat itself that''s taking 0.1+s.Which version Lustre they are using? for further investigation, you can collect detailed Lustre log on some client with the same test you did before on it to check which operation(s) slowdown the process. Cheers, Nasf> > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > On Jan 20, 2012, at 2:28 PM, Mark Hahn wrote: > >>> I''m sorry, I''m not quite understanding what you''re asking. I don''t have >>> ACLs specifically enabled anywhere (and would expect the default is >>> disabled). >> I guess what I was suggesting is that you could try a simple experiment: >> mount a client requesting the acl mount option. (I don''t know whether the "mount -oremount,acl ..." trick will work with Lustre.) >> >> if the problem goes away, you''re done. >> >> you also mentioned OSS load - I don''t see how that could be related, >> since OSSs are not involved in metadata operations like lstat or getxattr. >> (though depending on Lustre version, they could be involved in fetching >> actual size of files, which is especially salient on striped files...) >> >> >> >>> ---------------- >>> John White >>> HPC Systems Engineer >>> (510) 486-7307 >>> One Cyclotron Rd, MS: 50C-3396 >>> Lawrence Berkeley National Lab >>> Berkeley, CA 94720 >>> >>> On Jan 20, 2012, at 12:49 PM, Mark Hahn wrote: >>> >>>>> 0.916908 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a5", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) >>>> are your clients mounting with the acl option >>>> and acl isn''t missing on the mds mount? >> -- >> operator may differ from spokesperson. hahn at mcmaster.ca > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
What is your lustre version? It looks like LU-15. Thanks WangDi On 01/20/2012 02:35 PM, John White wrote:> Well, I was reading the strace wrong anyway: > lstat("../403/a323", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.134326> > getxattr("../403/a323", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000018> > lstat("../403/a330", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.158898> > getxattr("../403/a330", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000019> > lstat("../403/a331", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.239466> > getxattr("../403/a331", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> > lstat("../403/a332", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.130146> > getxattr("../403/a332", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> > > The getxattr takes an incredibly short amount of time, it''s the lstat itself that''s taking 0.1+s. > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > On Jan 20, 2012, at 2:28 PM, Mark Hahn wrote: > >>> I''m sorry, I''m not quite understanding what you''re asking. I don''t have >>> ACLs specifically enabled anywhere (and would expect the default is >>> disabled). >> I guess what I was suggesting is that you could try a simple experiment: >> mount a client requesting the acl mount option. (I don''t know whether the "mount -oremount,acl ..." trick will work with Lustre.) >> >> if the problem goes away, you''re done. >> >> you also mentioned OSS load - I don''t see how that could be related, >> since OSSs are not involved in metadata operations like lstat or getxattr. >> (though depending on Lustre version, they could be involved in fetching >> actual size of files, which is especially salient on striped files...) >> >> >> >>> ---------------- >>> John White >>> HPC Systems Engineer >>> (510) 486-7307 >>> One Cyclotron Rd, MS: 50C-3396 >>> Lawrence Berkeley National Lab >>> Berkeley, CA 94720 >>> >>> On Jan 20, 2012, at 12:49 PM, Mark Hahn wrote: >>> >>>>> 0.916908 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a5", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) >>>> are your clients mounting with the acl option >>>> and acl isn''t missing on the mds mount? >> -- >> operator may differ from spokesperson. hahn at mcmaster.ca > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Fri, Jan 20, 2012 at 02:35:19PM -0800, John White wrote:>Well, I was reading the strace wrong anyway: >lstat("../403/a323", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.134326> >getxattr("../403/a323", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000018> >lstat("../403/a330", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.158898> >getxattr("../403/a330", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000019> >lstat("../403/a331", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.239466> >getxattr("../403/a331", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000012> >lstat("../403/a332", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.130146> >getxattr("../403/a332", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) <0.000012> > >The getxattr takes an incredibly short amount of time, it''s the lstat itself that''s taking 0.1+s.it used to be that weird slowdowns and high load could be caused by kernel zone_reclaim confusion, so firstly I''d suggest checking that vm.zone_reclaim_mode=0 everywhere (clients and servers). after that see if turning off read & write_through caches on OSS''s helps metadata rates. there''s a fair chance that streaming i/o to OSS''s is filling OSS ram and pushing inodes/dentries out of OSS vfs cache causing big metadata slowdowns - the more streaming i/o the greater the slowdown. if turning off the data caches fixes the problem for you (ie. it''s not faulty hardware or an old lustre version or something else) then there are couple of different methods that could let you get both data caching and good metadata rates, but first things first... cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility
Oh, that''s very interesting. We''re at 1.8.4-ddn2.1. I''ll keep moving forward with our migration to 1.8.6-wc and see what happens. ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 Lawrence Berkeley National Lab Berkeley, CA 94720 On Jan 21, 2012, at 12:57 AM, wangdi wrote:> What is your lustre version? It looks like LU-15. > > Thanks > WangDi > On 01/20/2012 02:35 PM, John White wrote: >> Well, I was reading the strace wrong anyway: >> lstat("../403/a323", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.134326> >> getxattr("../403/a323", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000018> >> lstat("../403/a330", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.158898> >> getxattr("../403/a330", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000019> >> lstat("../403/a331", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.239466> >> getxattr("../403/a331", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> >> lstat("../403/a332", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.130146> >> getxattr("../403/a332", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> >> >> The getxattr takes an incredibly short amount of time, it''s the lstat itself that''s taking 0.1+s. >> ---------------- >> John White >> HPC Systems Engineer >> (510) 486-7307 >> One Cyclotron Rd, MS: 50C-3396 >> Lawrence Berkeley National Lab >> Berkeley, CA 94720 >> >> On Jan 20, 2012, at 2:28 PM, Mark Hahn wrote: >> >>>> I''m sorry, I''m not quite understanding what you''re asking. I don''t have >>>> ACLs specifically enabled anywhere (and would expect the default is >>>> disabled). >>> I guess what I was suggesting is that you could try a simple experiment: >>> mount a client requesting the acl mount option. (I don''t know whether the "mount -oremount,acl ..." trick will work with Lustre.) >>> >>> if the problem goes away, you''re done. >>> >>> you also mentioned OSS load - I don''t see how that could be related, >>> since OSSs are not involved in metadata operations like lstat or getxattr. >>> (though depending on Lustre version, they could be involved in fetching >>> actual size of files, which is especially salient on striped files...) >>> >>> >>> >>>> ---------------- >>>> John White >>>> HPC Systems Engineer >>>> (510) 486-7307 >>>> One Cyclotron Rd, MS: 50C-3396 >>>> Lawrence Berkeley National Lab >>>> Berkeley, CA 94720 >>>> >>>> On Jan 20, 2012, at 12:49 PM, Mark Hahn wrote: >>>> >>>>>> 0.916908 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a5", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) >>>>> are your clients mounting with the acl option >>>>> and acl isn''t missing on the mds mount? >>> -- >>> operator may differ from spokesperson. hahn at mcmaster.ca >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello, It is probably because heavy load of OSS cause the slow response of stat(fetching the file size). Did you see some slow IO message on OSS? On LU-15, http://jira.whamcloud.com/browse/LU-15, disable the readonly cache might be the first thing you need to do. There are some detail discussion about what you should do to mitigate this problem. (See comments around April 15th). Thanks WangDi On 01/20/2012 02:35 PM, John White wrote:> Well, I was reading the strace wrong anyway: > lstat("../403/a323", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.134326> > getxattr("../403/a323", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000018> > lstat("../403/a330", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.158898> > getxattr("../403/a330", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000019> > lstat("../403/a331", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.239466> > getxattr("../403/a331", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> > lstat("../403/a332", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0<0.130146> > getxattr("../403/a332", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)<0.000012> > > The getxattr takes an incredibly short amount of time, it''s the lstat itself that''s taking 0.1+s. > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > On Jan 20, 2012, at 2:28 PM, Mark Hahn wrote: > >>> I''m sorry, I''m not quite understanding what you''re asking. I don''t have >>> ACLs specifically enabled anywhere (and would expect the default is >>> disabled). >> I guess what I was suggesting is that you could try a simple experiment: >> mount a client requesting the acl mount option. (I don''t know whether the "mount -oremount,acl ..." trick will work with Lustre.) >> >> if the problem goes away, you''re done. >> >> you also mentioned OSS load - I don''t see how that could be related, >> since OSSs are not involved in metadata operations like lstat or getxattr. >> (though depending on Lustre version, they could be involved in fetching >> actual size of files, which is especially salient on striped files...) >> >> >> >>> ---------------- >>> John White >>> HPC Systems Engineer >>> (510) 486-7307 >>> One Cyclotron Rd, MS: 50C-3396 >>> Lawrence Berkeley National Lab >>> Berkeley, CA 94720 >>> >>> On Jan 20, 2012, at 12:49 PM, Mark Hahn wrote: >>> >>>>> 0.916908 getxattr("/global/scratch/jwhite/backuptest/highcount/3/a5", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) >>>> are your clients mounting with the acl option >>>> and acl isn''t missing on the mds mount? >> -- >> operator may differ from spokesperson. hahn at mcmaster.ca > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss