Hello! We are experimenting with a new lustre-2.2.0 and surprisingly not getting any statfs performance improvements. It''s generally as poor as in previous versions. Also, statahead does not seem to make any difference to statfs performance. Test suite: 3 identical machines (node21 for mgsmdt, node22 for ostoss and node23 as a client). Every machine has 2 intel L5420 at 2.5 Ghz cpus and 8Gb ram. All disks are fairly fast kingston ssd. Nodes are connected via QDR infiniband (ConnectX2 mellanox + voltaire switch). On node21 and node22: #rpm -qa | grep lustre kernel-2.6.32-220.4.2.el6_lustre.x86_64 lustre-iokit-1.4.0-1.noarch kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64 lustre-tests-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64 On node23: #rpm -qa | grep lustre lustre-iokit-1.4.0-1.noarch lustre-client-tests-2.2.0-2.6.32_220.4.2.el6.x86_64.x86_64 lustre-client-2.2.0-2.6.32_220.4.2.el6.x86_64.x86_64 lustre-client-modules-2.2.0-2.6.32_220.4.2.el6.x86_64.x86_64 #cat /etc/modprobe.d/lustre.conf options lnet networks="o2ib0(ibbond0)" Default test lustre filesystem was created in following steps: On node21: #mkfs.lustre --reformat --fsname=lustrewt --mgs --mdt /dev/vg1/mgsmdt #mount -t lustre /dev/vg1/mgsmdt /mgsmdt_mount/ On node22: #mkfs.lustre --reformat --ost --fsname=lustrewt --mgsnode=172.20.22.21 at o2ib0 /dev/vg2/ostoss #mount -t lustre /dev/vg2/ostoss /ostoss_mount Simple ls -l test for a 100000 files directory was done on the node23 with statahead on and off: On node23: #mount -t lustre 172.20.22.21 at o2ib:/lustrewt /mnt/temp -o noatime,nodiratime #mkdir /mnt/temp/a #cd /mnt/temp/a #for i in $(seq 1 100000) ; do echo $i ; dd if=/dev/zero of=./$i bs=4096 count=1 ; done #umount /mnt/temp -----statahead off----- On node23: #mount -t lustre 172.20.22.21 at o2ib:/lustrewt /mnt/temp -o noatime,nodiratime #echo 0 > /proc/fs/lustre/llite/lustrewt-ffff88021e427c00/statahead_max #cd /mnt/temp/a #time ls -l real 0m52.751s user 0m1.153s sys 0m20.101s #time ls -l real 0m21.280s user 0m1.086s sys 0m11.973s All subsequent runs complete in virtually the same time (21-24s). #umount /mnt/temp -----statahead on----- #mount -t lustre 172.20.22.21 at o2ib:/lustrewt /mnt/temp -o noatime,nodiratime #cat /proc/fs/lustre/llite/lustrewt-ffff88021e427c00/statahead_max 32 #time ls -l real 0m43.846s user 0m1.242s sys 0m20.444s #time ls -l real 0m24.000s user 0m1.104s sys 0m14.125s All subsequent runs complete in virtually the same time (21-24s). #strace -r ls -l ...... 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) -1 ENODATA (No data available) 0.000179 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000096 lgetxattr("77193", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) 0.000045 lgetxattr("77193", "system.posix_acl_default", 0x0, 0) -1 ENODATA (No data available) 0.000045 lstat("80570", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000232 lgetxattr("80570", "security.selinux", 0x129d820, 255) -1 ENODATA (No data available) 0.000209 lstat("80570", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000081 lgetxattr("80570", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) 0.000041 lgetxattr("80570", "system.posix_acl_default", 0x0, 0) -1 ENODATA (No data available) ........ Selinux is disabled on all nodes. Remounting fs with -o noacl on all nodes does not make any difference, ls -l takes 21-24 secs. After remounting with -o noacl: #strace -r ls -l ......... 0.000043 lstat("13382", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000103 lgetxattr("13382", "security.selinux", 0xe8cfc0, 255) = -1 ENODATA (No data available) 0.000141 lstat("13382", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000093 lgetxattr("13382", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) 0.000048 lstat("3014", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000121 lgetxattr("3014", "security.selinux", 0xe8cfe0, 255) = -1 ENODATA (No data available) 0.000151 lstat("3014", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 0.000091 lgetxattr("3014", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) ......... I tried to improve performance and played with many lustre parameters, but was never able to beat a magical 20 seconds for ls -l to take. It seems that lustre statfs just doesn''t want to do more than 5000 files/second for a single client. I''d grateful if someone could share his real-life ls -l performance results on lustre filesystem. It is possible, that i''m completely missing some obvious setting, so if anybody has any idea please let me know. Thanks. -- Ing. Yevheniy Demchenko Senior Linux Administrator UVT s.r.o.
On 2012-04-05, at 2:40 PM, Yevheniy Demchenko wrote:> We are experimenting with a new lustre-2.2.0 and surprisingly not > getting any statfs performance improvements. It''s generally as poor as > in previous versions. Also, statahead does not seem to make any > difference to statfs performance. > > Simple ls -l test for a 100000 files directory was done on the node23 > with statahead on and off:> -----statahead off----- > #time ls -l > real 0m52.751s > user 0m1.153s > sys 0m20.101s > > #time ls -l > real 0m21.280s > user 0m1.086s > sys 0m11.973s > All subsequent runs complete in virtually the same time (21-24s). > > -----statahead on----- > #time ls -l > real 0m43.846s > user 0m1.242s > sys 0m20.444s > > #time ls -l > real 0m24.000s > user 0m1.104s > sys 0m14.125s > All subsequent runs complete in virtually the same time (21-24s).It looks to me like a 25% improvement was seen - 53s to 44s for first run. After that, the directories/inodes/locks/attributes are all in the client cache, so statahead is no longer active (no RPCs need to be sent).> #strace -r ls -l > ...... > 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 > 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) > -1 ENODATA (No data available)This is selinux trying to fetch xattrs. A plague on selinux, IMHO, but we can''t get rid of it. The prefetching and caching of existing and negative (i.e. non-existent) xattrs on the client is being discussed in http://bugs.whamcloud.com/browse/LU-549. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/
On 04/06/2012 06:40 AM, Andreas Dilger wrote:> On 2012-04-05, at 2:40 PM, Yevheniy Demchenko wrote: >> We are experimenting with a new lustre-2.2.0 and surprisingly not >> getting any statfs performance improvements. It''s generally as poor as >> in previous versions. Also, statahead does not seem to make any >> difference to statfs performance. >> >> Simple ls -l test for a 100000 files directory was done on the node23 >> with statahead on and off: >> -----statahead off----- >> #time ls -l >> real 0m52.751s >> user 0m1.153s >> sys 0m20.101s >> >> #time ls -l >> real 0m21.280s >> user 0m1.086s >> sys 0m11.973s >> All subsequent runs complete in virtually the same time (21-24s). >> >> -----statahead on----- >> #time ls -l >> real 0m43.846s >> user 0m1.242s >> sys 0m20.444s >> >> #time ls -l >> real 0m24.000s >> user 0m1.104s >> sys 0m14.125s >> All subsequent runs complete in virtually the same time (21-24s). > It looks to me like a 25% improvement was seen - 53s to 44s for first run. After that, the directories/inodes/locks/attributes are all in the client cache, so statahead is no longer active (no RPCs need to be sent). > >> #strace -r ls -l >> ...... >> 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 >> 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) >> -1 ENODATA (No data available) > This is selinux trying to fetch xattrs. A plague on selinux, IMHO, but we can''t get rid of it. The prefetching and caching of existing and negative (i.e. non-existent) xattrs on the client is being discussed in http://bugs.whamcloud.com/browse/LU-549.Thanks for reply, things are starting to be clearer. It seems, that problem is not in selinux itself, but in not-caching xattrs in lustre in general. if i understand correctly, statahead tries to read ahead file attributes and put them in cache, so that client would not contact mds for getting them. But if we don''t cache xattrs, every tool like ls or rsync will contact mds for every file as it tries to get acls and selinux attributes, which can not be cached. Wouldn''t it be better to catch the case when FS is mounted with -o noacl and selinux is disabled on the client side and immediately return EOPNOTSUPP without contacting mds? We would be happy to trade acls and xattrs for a better statfs performance.> > Cheers, Andreas > -- > Andreas Dilger Whamcloud, Inc. > Principal Lustre Engineer http://www.whamcloud.com/ > > > >Ing. Yevheniy Demchenko Senior Linux Administrator UVT s.r.o.
On 2012-04-06, at 10:33 AM, Yevheniy Demchenko wrote:> On 04/06/2012 06:40 AM, Andreas Dilger wrote: >> On 2012-04-05, at 2:40 PM, Yevheniy Demchenko wrote: >>> We are experimenting with a new lustre-2.2.0 and surprisingly not >>> getting any statfs performance improvements. It''s generally as poor as in previous versions. Also, statahead does not seem to make any >>> difference to statfs performance. >>> >>> #strace -r ls -l >>> ...... >>> 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 >>> 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) >>> -1 ENODATA (No data available) >> >> This is selinux trying to fetch xattrs. A plague on selinux, IMHO, but we can''t get rid of it. The prefetching and caching of existing and negative (i.e. non-existent) xattrs on the client is being discussed in http://bugs.whamcloud.com/browse/LU-549. >> > Thanks for reply, things are starting to be clearer. It seems, that > problem is not in selinux itself, but in not-caching xattrs in lustre in > general. if i understand correctly, statahead tries to read ahead file > attributes and put them in cache, so that client would not contact mds > for getting them. But if we don''t cache xattrs, every tool like ls or > rsync will contact mds for every file as it tries to get acls and > selinux attributes, which can not be cached. Wouldn''t it be better to > catch the case when FS is mounted with -o noacl and selinux is disabled > on the client side and immediately return EOPNOTSUPP without contacting > mds? We would be happy to trade acls and xattrs for a better statfs > performance.Yes, that makes sense. It would probably be a simple patch in ll_getxattr_common() if you wanted to give it a try. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/
On 04/06/2012 07:43 PM, Andreas Dilger wrote:> On 2012-04-06, at 10:33 AM, Yevheniy Demchenko wrote: >> On 04/06/2012 06:40 AM, Andreas Dilger wrote: >>> On 2012-04-05, at 2:40 PM, Yevheniy Demchenko wrote: >>>> We are experimenting with a new lustre-2.2.0 and surprisingly not >>>> getting any statfs performance improvements. It''s generally as poor as in previous versions. Also, statahead does not seem to make any >>>> difference to statfs performance. >>>> >>>> #strace -r ls -l >>>> ...... >>>> 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 >>>> 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) >>>> -1 ENODATA (No data available) >>> This is selinux trying to fetch xattrs. A plague on selinux, IMHO, but we can''t get rid of it. The prefetching and caching of existing and negative (i.e. non-existent) xattrs on the client is being discussed in http://bugs.whamcloud.com/browse/LU-549. >>> >> Thanks for reply, things are starting to be clearer. It seems, that >> problem is not in selinux itself, but in not-caching xattrs in lustre in >> general. if i understand correctly, statahead tries to read ahead file >> attributes and put them in cache, so that client would not contact mds >> for getting them. But if we don''t cache xattrs, every tool like ls or >> rsync will contact mds for every file as it tries to get acls and >> selinux attributes, which can not be cached. Wouldn''t it be better to >> catch the case when FS is mounted with -o noacl and selinux is disabled >> on the client side and immediately return EOPNOTSUPP without contacting >> mds? We would be happy to trade acls and xattrs for a better statfs >> performance. > Yes, that makes sense. It would probably be a simple patch in ll_getxattr_common() if you wanted to give it a try. > > Cheers, Andreas > -- > Andreas Dilger Whamcloud, Inc. > Principal Lustre Engineer http://www.whamcloud.com/Some new results and patch: 1. run (immediately after fs mount): #time ls -l real 0m33.170s user 0m0.945s sys 0m15.861s 2. and subsequent runs: #time ls -l real 0m12.134s user 0m0.819s sys 0m10.491s --- ./lustre-2.2.50-master/lustre/llite/xattr.c 2012-03-16 08:20:59.000000000 +0100 +++ ./lustre-2.2.50/lustre/llite/xattr.c 2012-04-06 21:48:19.642232233 +0200 @@ -40,6 +40,7 @@ #include <linux/sched.h> #include <linux/mm.h> #include <linux/smp_lock.h> +#include <linux/selinux.h> #define DEBUG_SUBSYSTEM S_LLITE @@ -94,7 +95,8 @@ xattr_type == XATTR_ACL_DEFAULT_T) && !(sbi->ll_flags & LL_SBI_ACL)) return -EOPNOTSUPP; - + if (xattr_type == XATTR_SECURITY_T && !selinux_is_enabled()) + return -ENODATA; if (xattr_type == XATTR_USER_T && !(sbi->ll_flags & LL_SBI_USER_XATTR)) return -EOPNOTSUPP; if (xattr_type == XATTR_TRUSTED_T && !cfs_capable(CFS_CAP_SYS_ADMIN)) Indeed, selinux doesn''t do any good for lustre statfs. It seems that acl part is already filtered out in xattr.c. I''ll try to find out where lustre client spends 12 "sys" seconds in subsequent runs. Ing. Yevheniy Demchenko Senior Linux Administrator UVT s.r.o.
On 2012-04-06, at 2:09 PM, Yevheniy Demchenko wrote:> On 04/06/2012 07:43 PM, Andreas Dilger wrote: >> On 2012-04-06, at 10:33 AM, Yevheniy Demchenko wrote: >>> On 04/06/2012 06:40 AM, Andreas Dilger wrote: >>>> On 2012-04-05, at 2:40 PM, Yevheniy Demchenko wrote: >>>>> We are experimenting with a new lustre-2.2.0 and surprisingly not >>>>> getting any statfs performance improvements. It''s generally as poor as in previous versions. Also, statahead does not seem to make any >>>>> difference to statfs performance. >>>>> >>>>> #strace -r ls -l >>>>> ...... >>>>> 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 >>>>> 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) >>>>> -1 ENODATA (No data available) >>>> >>>> This is selinux trying to fetch xattrs. A plague on selinux, IMHO, but we can''t get rid of it. The prefetching and caching of existing and negative (i.e. non-existent) xattrs on the client is being discussed in http://bugs.whamcloud.com/browse/LU-549. >>> >>> Thanks for reply, things are starting to be clearer. It seems, >>> that problem is not in selinux itself, but in not-caching xattrs >>> in lustre in general. if i understand correctly, statahead tries >>> to read ahead file attributes and put them in cache, so that client >>> would not contact MDS for getting them. But if we don''t cache >>> xattrs, every tool like ls or rsync will contact mds for every >>> file as it tries to get acls and selinux attributes, which can not >>> be cached. Wouldn''t it be better to catch the case when FS is >>> mounted with -o noacl and selinux is disabled on the client side >>> and immediately return EOPNOTSUPP without contacting MDS? We would >>> be happy to trade acls and xattrs for a better statfs performance. >> >> Yes, that makes sense. It would probably be a simple patch in ll_getxattr_common() if you wanted to give it a try. > > Some new results and patch: > 1. run (immediately after fs mount): > #time ls -l > real 0m33.170s > user 0m0.945s > sys 0m15.861s > 2. and subsequent runs: > #time ls -l > real 0m12.134s > user 0m0.819s > sys 0m10.491sSaving 10s for 100k files is quite respectable for such a simple patch. Could you please submit it to Gerrit for inclusion: http://wiki.whamcloud.com/display/PUB/Submitting+Changes http://wiki.whamcloud.com/display/PUB/Using+Gerrit You can use LU-549 for this change.> --- ./lustre-2.2.50-master/lustre/llite/xattr.c 2012-03-16 > 08:20:59.000000000 +0100 > +++ ./lustre-2.2.50/lustre/llite/xattr.c 2012-04-06 21:48:19.642232233 > +0200 > @@ -40,6 +40,7 @@ > #include <linux/sched.h> > #include <linux/mm.h> > #include <linux/smp_lock.h> > +#include <linux/selinux.h> > > #define DEBUG_SUBSYSTEM S_LLITE > > @@ -94,7 +95,8 @@ > xattr_type == XATTR_ACL_DEFAULT_T) && > !(sbi->ll_flags & LL_SBI_ACL)) > return -EOPNOTSUPP; > - > + if (xattr_type == XATTR_SECURITY_T && !selinux_is_enabled()) > + return -ENODATA; > if (xattr_type == XATTR_USER_T && !(sbi->ll_flags & > LL_SBI_USER_XATTR)) > return -EOPNOTSUPP; > if (xattr_type == XATTR_TRUSTED_T && > !cfs_capable(CFS_CAP_SYS_ADMIN)) > > Indeed, selinux doesn''t do any good for lustre statfs. It seems that acl > part is already filtered out in xattr.c. > I''ll try to find out where lustre client spends 12 "sys" seconds in > subsequent runs.Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/
On 04/06/2012 11:08 PM, Andreas Dilger wrote:> On 2012-04-06, at 2:09 PM, Yevheniy Demchenko wrote: >> On 04/06/2012 07:43 PM, Andreas Dilger wrote: >>> On 2012-04-06, at 10:33 AM, Yevheniy Demchenko wrote: >>>> On 04/06/2012 06:40 AM, Andreas Dilger wrote: >>>>> On 2012-04-05, at 2:40 PM, Yevheniy Demchenko wrote: >>>>>> We are experimenting with a new lustre-2.2.0 and surprisingly not >>>>>> getting any statfs performance improvements. It''s generally as poor as in previous versions. Also, statahead does not seem to make any >>>>>> difference to statfs performance. >>>>>> >>>>>> #strace -r ls -l >>>>>> ...... >>>>>> 0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 >>>>>> 0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) >>>>>> -1 ENODATA (No data available) >>>>> This is selinux trying to fetch xattrs. A plague on selinux, IMHO, but we can''t get rid of it. The prefetching and caching of existing and negative (i.e. non-existent) xattrs on the client is being discussed in http://bugs.whamcloud.com/browse/LU-549. >>>> Thanks for reply, things are starting to be clearer. It seems, >>>> that problem is not in selinux itself, but in not-caching xattrs >>>> in lustre in general. if i understand correctly, statahead tries >>>> to read ahead file attributes and put them in cache, so that client >>>> would not contact MDS for getting them. But if we don''t cache >>>> xattrs, every tool like ls or rsync will contact mds for every >>>> file as it tries to get acls and selinux attributes, which can not >>>> be cached. Wouldn''t it be better to catch the case when FS is >>>> mounted with -o noacl and selinux is disabled on the client side >>>> and immediately return EOPNOTSUPP without contacting MDS? We would >>>> be happy to trade acls and xattrs for a better statfs performance. >>> Yes, that makes sense. It would probably be a simple patch in ll_getxattr_common() if you wanted to give it a try. >> Some new results and patch: >> 1. run (immediately after fs mount): >> #time ls -l >> real 0m33.170s >> user 0m0.945s >> sys 0m15.861s >> 2. and subsequent runs: >> #time ls -l >> real 0m12.134s >> user 0m0.819s >> sys 0m10.491s > Saving 10s for 100k files is quite respectable for such a simple patch. > Could you please submit it to Gerrit for inclusion: > > http://wiki.whamcloud.com/display/PUB/Submitting+Changes > http://wiki.whamcloud.com/display/PUB/Using+Gerrit > > You can use LU-549 for this change. >Submitted. Who should i add as a reviewer for this patch? Thanks, Ing. Yevheniy Demchenko Senior Linux Administrator UVT s.r.o.