Hi, I have a gluster 3.10 volume with a dir with ~1 million small files in them, say mounted at /mnt/dir with FUSE, and I'm observing something weird: When I list and stat them all using rsync, then the lstat() calls that rsync does are incredibly fast (23 microseconds per call on average, definitely faster than a network roundtrip between my 3-machine bricks connected via Ethernet). But when I try to back them up with the `bup` tool (https://github.com/bup/bup), which (at least according to strace) does the same syscalls as rsync to stat all files, it takes 1700 microseconds per lstat() call, and the total time to go over all files is 50x higher. These are my strace statistics: rsync strace: $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 52.35 69.682773 26617 2618 1 select 39.60 52.712292 1056 49907 getdents 8.00 10.655949 11 998102 lstat 0.02 0.022890 12 1900 read 0.01 0.017219 21 829 write 0.01 0.012093 14 868 munmap 0.01 0.006656 11 606 mmap 0.00 0.004168 1389 3 readlink 0.00 0.001796 1796 1 chdir 0.00 0.001019 510 2 clone 0.00 0.000841 44 19 13 open Took ~50 seconds real time to complete. bup strace (I interrupted it after a while): strace: Process 10749 attached strace: Process 10750 attached Indexing: 25600 (566 paths/s) ^C % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 89.55 1140.837325 1700 671016 lstat 3.92 49.875376 934 53387 getdents 3.58 45.655985 52238 874 read 2.14 27.293944 789 34588 5799 open 0.57 7.266342 141 51384 llistxattr 0.09 1.090689 42 25692 25692 getxattr 0.06 0.780977 26 29825 1019 close 0.05 0.601806 23 25739 25722 ioctl 0.03 0.373894 1851 202 select 0.00 0.055953 14 3879 brk real 20m52.150s user 0m2.608s sys 0m11.456s Note I passed `-c -w` to strace to measure wall time of the syscalls spend, not system CPU time. Using time strace -f -c -w ls -lU /mnt/dir > /dev/null shows that ls is as fast as rsync (also in 50 seconds). (Aside: I've filed the large number of getdents() as https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the core question of this email.) My volume info: # gluster vol info Volume Name: myvol Type: Replicate Volume ID: ... Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.0.0.1:/data/glusterfs/brick Brick2: 10.0.0.2:/data/glusterfs/brick Brick3: 10.0.0.3:/data/glusterfs/brick Options Reconfigured: changelog.capture-del-path: on changelog.changelog: on storage.build-pgfid: on nfs.disable: on transport.address-family: inet client.ssl: on server.ssl: on storage.linux-aio: on performance.io-thread-count: 64 performance.readdir-ahead: on server.event-threads: 32 client.event-threads: 32 server.outstanding-rpc-limit: 64 cluster.lookup-unhashed: auto performance.flush-behind: on performance.strict-write-ordering: off performance.high-prio-threads: 64 performance.normal-prio-threads: 64 performance.low-prio-threads: 64 performance.write-behind-window-size: 10MB cluster.ensure-durability: on performance.lazy-open: yes cluster.use-compound-fops: off performance.open-behind: on features.cache-invalidation: off performance.quick-read: off performance.read-ahead: off performance.stat-prefetch: off changelog.rollover-time: 1 cluster.self-heal-daemon: enable Questions: What could explain why bup's lstat()s are slow and rsync's lstat()s are fast? Also, how comes rsync's lstat()s can be faster than a network roundtrip at all, given that I have all caching disabled (e.g. stat-prefetch: off, cache-invalidation: off)? Is there some caching going on? Might bup issue its syscalls in an order that might destroy this caching, while rsync and ls use a favourable order? Thanks!
Hi Niklas, Out of interest have you tried testing performance with performance.stat-prefetch enabled? -- Sam McLeod @s_mcleod https://smcleod.net> On 14 Sep 2017, at 10:42 pm, Niklas Hamb?chen <mail at nh2.me> wrote: > > Hi, > > I have a gluster 3.10 volume with a dir with ~1 million small files in > them, say mounted at /mnt/dir with FUSE, and I'm observing something weird: > > When I list and stat them all using rsync, then the lstat() calls that > rsync does are incredibly fast (23 microseconds per call on average, > definitely faster than a network roundtrip between my 3-machine bricks > connected via Ethernet). > But when I try to back them up with the `bup` tool > (https://github.com/bup/bup), which (at least according to strace) does > the same syscalls as rsync to stat all files, it takes 1700 microseconds > per lstat() call, and the total time to go over all files is 50x higher. > > These are my strace statistics: > > rsync strace: > > $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 52.35 69.682773 26617 2618 1 select > 39.60 52.712292 1056 49907 getdents > 8.00 10.655949 11 998102 lstat > 0.02 0.022890 12 1900 read > 0.01 0.017219 21 829 write > 0.01 0.012093 14 868 munmap > 0.01 0.006656 11 606 mmap > 0.00 0.004168 1389 3 readlink > 0.00 0.001796 1796 1 chdir > 0.00 0.001019 510 2 clone > 0.00 0.000841 44 19 13 open > > Took ~50 seconds real time to complete. > > bup strace (I interrupted it after a while): > > strace: Process 10749 attached > strace: Process 10750 attached > Indexing: 25600 (566 paths/s) > ^C > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 89.55 1140.837325 1700 671016 lstat > 3.92 49.875376 934 53387 getdents > 3.58 45.655985 52238 874 read > 2.14 27.293944 789 34588 5799 open > 0.57 7.266342 141 51384 llistxattr > 0.09 1.090689 42 25692 25692 getxattr > 0.06 0.780977 26 29825 1019 close > 0.05 0.601806 23 25739 25722 ioctl > 0.03 0.373894 1851 202 select > 0.00 0.055953 14 3879 brk > > real 20m52.150s > user 0m2.608s > sys 0m11.456s > > Note I passed `-c -w` to strace to measure wall time of the syscalls > spend, not system CPU time. > > Using > > time strace -f -c -w ls -lU /mnt/dir > /dev/null > > shows that ls is as fast as rsync (also in 50 seconds). > > (Aside: I've filed the large number of getdents() as > https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the > core question of this email.) > > My volume info: > > # gluster vol info > > Volume Name: myvol > Type: Replicate > Volume ID: ... > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 10.0.0.1:/data/glusterfs/brick > Brick2: 10.0.0.2:/data/glusterfs/brick > Brick3: 10.0.0.3:/data/glusterfs/brick > Options Reconfigured: > changelog.capture-del-path: on > changelog.changelog: on > storage.build-pgfid: on > nfs.disable: on > transport.address-family: inet > client.ssl: on > server.ssl: on > storage.linux-aio: on > performance.io-thread-count: 64 > performance.readdir-ahead: on > server.event-threads: 32 > client.event-threads: 32 > server.outstanding-rpc-limit: 64 > cluster.lookup-unhashed: auto > performance.flush-behind: on > performance.strict-write-ordering: off > performance.high-prio-threads: 64 > performance.normal-prio-threads: 64 > performance.low-prio-threads: 64 > performance.write-behind-window-size: 10MB > cluster.ensure-durability: on > performance.lazy-open: yes > cluster.use-compound-fops: off > performance.open-behind: on > features.cache-invalidation: off > performance.quick-read: off > performance.read-ahead: off > performance.stat-prefetch: off > changelog.rollover-time: 1 > cluster.self-heal-daemon: enable > > Questions: > > What could explain why bup's lstat()s are slow and rsync's lstat()s are > fast? > > Also, how comes rsync's lstat()s can be faster than a network roundtrip > at all, given that I have all caching disabled (e.g. stat-prefetch: off, > cache-invalidation: off)? > > Is there some caching going on? Might bup issue its syscalls in an order > that might destroy this caching, while rsync and ls use a favourable order? > > Thanks! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170915/c8d15828/attachment.html>
On 15/09/17 02:45, Sam McLeod wrote:> Out of interest have you tried testing performance > with performance.stat-prefetch enabled?Not yet, because I'm still struggling to understand the current more basic setup's performance behaviour (with it being off), but it's definitely on my list and I'll report the outcome.
I did a quick test on one of my lab clusters with no tuning except for quota being enabled: [root at dell-per730-03 ~]# gluster v info Volume Name: vmstore Type: Replicate Volume ID: 0d2e4c49-334b-47c9-8e72-86a4c040a7bd Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 192.168.50.1:/rhgs/brick1/vmstore Brick2: 192.168.50.2:/rhgs/brick1/vmstore Brick3: 192.168.50.3:/rhgs/ssd/vmstore (arbiter) Options Reconfigured: features.quota-deem-statfs: on nfs.disable: on features.inode-quota: on features.quota: on And I ran the smallfile benchmark, created 80k 64KB files. After that I clear cache everywhere and ran a smallfile stat test [root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files 10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation stat version : 3.1 hosts in test : None top test directory(s) : ['/gluster-mount/s-file'] operation : stat files/thread : 10000 threads : 8 record size (KB, 0 = maximum) : 0 file size (KB) : 64 file size distribution : fixed files per dir : 100 dirs per dir : 10 threads share directories? : N filename prefix : filename suffix : hash file number into dir.? : N fsync after modify? : N pause between files (microsec) : 0 finish all requests? : Y stonewall? : Y measure response times? : N verify read? : Y verbose? : False log to stderr? : False ext.attr.size : 0 ext.attr.count : 0 host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed = 33.513184,files = 9400,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed = 33.322282,files = 9700,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed = 33.233768,files = 9600,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed = 33.145645,files = 10000,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed = 33.974151,files = 9600,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed = 33.220816,files = 9300,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed = 33.304850,files = 9900,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed = 33.482701,files = 9600,records = 0,status = ok total threads = 8 total files = 77100 total IOPS = 0 96.38% of requested files processed, minimum is 90.00 elapsed time = 33.974 files/sec = 2269.372389 So I was only able to stat 2269 files / sec. Given this I think stating 1 million files(at least on my config) should take about 440 seconds. I don't know what kind of HW you are using but 50 seconds to stat 1 million files seems faster than what I would think gluster can do. Using ls -l is a different story: [root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files 10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation ls-l version : 3.1 hosts in test : None top test directory(s) : ['/gluster-mount/s-file'] operation : ls-l files/thread : 10000 threads : 8 record size (KB, 0 = maximum) : 0 file size (KB) : 64 file size distribution : fixed files per dir : 100 dirs per dir : 10 threads share directories? : N filename prefix : filename suffix : hash file number into dir.? : N fsync after modify? : N pause between files (microsec) : 0 finish all requests? : Y stonewall? : Y measure response times? : N verify read? : Y verbose? : False log to stderr? : False ext.attr.size : 0 ext.attr.count : 0 host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed = 2.867676,files = 9500,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed = 2.854434,files = 10000,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed = 2.864292,files = 9000,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed = 2.878101,files = 9600,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed = 2.883071,files = 9500,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed = 2.885429,files = 9600,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed = 2.865360,files = 9500,records = 0,status = ok host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed = 2.866940,files = 9400,records = 0,status = ok total threads = 8 total files = 76100 total IOPS = 0 95.12% of requested files processed, minimum is 90.00 elapsed time = 2.885 files/sec = 26373.893173 So here I get 26k files / sec. Based on 25k files / sec that works out to about 40 seconds, much more like what you are seeing. Are you sure that your BUP app isn't doing something like ls -l where as rsync is doing something like stat? Maybe you could try the smallfile benchmark and see what it gives you. Also, make sure you clear cache on both clients and servers(echo 3 > /proc/sys/vm/drop_caches) before running your tests to eliminate server and client side caching. If you are not dropping cache this could also cause the differences you are seeing. -b ----- Original Message -----> From: "Niklas Hamb?chen" <mail at nh2.me> > To: gluster-users at gluster.org > Sent: Thursday, September 14, 2017 8:42:44 AM > Subject: [Gluster-users] Confusing lstat() performance > > Hi, > > I have a gluster 3.10 volume with a dir with ~1 million small files in > them, say mounted at /mnt/dir with FUSE, and I'm observing something weird: > > When I list and stat them all using rsync, then the lstat() calls that > rsync does are incredibly fast (23 microseconds per call on average, > definitely faster than a network roundtrip between my 3-machine bricks > connected via Ethernet). > But when I try to back them up with the `bup` tool > (https://github.com/bup/bup), which (at least according to strace) does > the same syscalls as rsync to stat all files, it takes 1700 microseconds > per lstat() call, and the total time to go over all files is 50x higher. > > These are my strace statistics: > > rsync strace: > > $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 52.35 69.682773 26617 2618 1 select > 39.60 52.712292 1056 49907 getdents > 8.00 10.655949 11 998102 lstat > 0.02 0.022890 12 1900 read > 0.01 0.017219 21 829 write > 0.01 0.012093 14 868 munmap > 0.01 0.006656 11 606 mmap > 0.00 0.004168 1389 3 readlink > 0.00 0.001796 1796 1 chdir > 0.00 0.001019 510 2 clone > 0.00 0.000841 44 19 13 open > > Took ~50 seconds real time to complete. > > bup strace (I interrupted it after a while): > > strace: Process 10749 attached > strace: Process 10750 attached > Indexing: 25600 (566 paths/s) > ^C > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 89.55 1140.837325 1700 671016 lstat > 3.92 49.875376 934 53387 getdents > 3.58 45.655985 52238 874 read > 2.14 27.293944 789 34588 5799 open > 0.57 7.266342 141 51384 llistxattr > 0.09 1.090689 42 25692 25692 getxattr > 0.06 0.780977 26 29825 1019 close > 0.05 0.601806 23 25739 25722 ioctl > 0.03 0.373894 1851 202 select > 0.00 0.055953 14 3879 brk > > real 20m52.150s > user 0m2.608s > sys 0m11.456s > > Note I passed `-c -w` to strace to measure wall time of the syscalls > spend, not system CPU time. > > Using > > time strace -f -c -w ls -lU /mnt/dir > /dev/null > > shows that ls is as fast as rsync (also in 50 seconds). > > (Aside: I've filed the large number of getdents() as > https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the > core question of this email.) > > My volume info: > > # gluster vol info > > Volume Name: myvol > Type: Replicate > Volume ID: ... > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 10.0.0.1:/data/glusterfs/brick > Brick2: 10.0.0.2:/data/glusterfs/brick > Brick3: 10.0.0.3:/data/glusterfs/brick > Options Reconfigured: > changelog.capture-del-path: on > changelog.changelog: on > storage.build-pgfid: on > nfs.disable: on > transport.address-family: inet > client.ssl: on > server.ssl: on > storage.linux-aio: on > performance.io-thread-count: 64 > performance.readdir-ahead: on > server.event-threads: 32 > client.event-threads: 32 > server.outstanding-rpc-limit: 64 > cluster.lookup-unhashed: auto > performance.flush-behind: on > performance.strict-write-ordering: off > performance.high-prio-threads: 64 > performance.normal-prio-threads: 64 > performance.low-prio-threads: 64 > performance.write-behind-window-size: 10MB > cluster.ensure-durability: on > performance.lazy-open: yes > cluster.use-compound-fops: off > performance.open-behind: on > features.cache-invalidation: off > performance.quick-read: off > performance.read-ahead: off > performance.stat-prefetch: off > changelog.rollover-time: 1 > cluster.self-heal-daemon: enable > > Questions: > > What could explain why bup's lstat()s are slow and rsync's lstat()s are > fast? > > Also, how comes rsync's lstat()s can be faster than a network roundtrip > at all, given that I have all caching disabled (e.g. stat-prefetch: off, > cache-invalidation: off)? > > Is there some caching going on? Might bup issue its syscalls in an order > that might destroy this caching, while rsync and ls use a favourable order? > > Thanks! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
I attached my strace output for you to look at: Smallfile stat: files/sec = 2270.307299 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 84.48 272.324412 3351 81274 1141 stat 10.20 32.880871 81997 401 read 2.78 8.957013 97359 92 write 2.44 7.854060 436337 18 select 0.03 0.085606 67 1277 984 open 0.03 0.082586 13764 6 getdents 0.02 0.048819 4882 10 unlink 0.01 0.046833 46833 1 mkdir 0.01 0.027127 27127 1 rmdir 0.00 0.012896 12896 1 fsync 0.00 0.003129 6 531 fstat 0.00 0.002766 66 42 2 wait4 Smallfile ls -l: files/sec = 22852.145303 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.26 18.558147 3849 4822 getdents 26.96 8.302826 461268 18 select 4.92 1.515195 1868 811 openat 4.83 1.486988 18 81294 1161 stat 2.23 0.686146 7458 92 write 0.26 0.080318 8032 10 unlink 0.17 0.050832 40 1277 984 open 0.10 0.030263 30263 1 rmdir 0.08 0.023776 23776 1 fsync 0.05 0.016408 41 401 read 0.05 0.016061 16061 1 mkdir 0.04 0.011154 10 1108 close 0.01 0.003229 6 531 fstat 0.01 0.002840 68 42 2 wait4 Look at the difference between my two stat calls: 84.48 272.324412 3351 81274 1141 stat <--- stat 4.83 1.486988 18 81294 1161 stat <--- ls -l Maybe you two applications are behaving differently, like smallfile stat and smallfile ls-l are? -b ----- Original Message -----> From: "Ben Turner" <bturner at redhat.com> > To: "Niklas Hamb?chen" <mail at nh2.me> > Cc: gluster-users at gluster.org > Sent: Sunday, September 17, 2017 8:54:01 PM > Subject: Re: [Gluster-users] Confusing lstat() performance > > I did a quick test on one of my lab clusters with no tuning except for quota > being enabled: > > [root at dell-per730-03 ~]# gluster v info > > Volume Name: vmstore > Type: Replicate > Volume ID: 0d2e4c49-334b-47c9-8e72-86a4c040a7bd > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: 192.168.50.1:/rhgs/brick1/vmstore > Brick2: 192.168.50.2:/rhgs/brick1/vmstore > Brick3: 192.168.50.3:/rhgs/ssd/vmstore (arbiter) > Options Reconfigured: > features.quota-deem-statfs: on > nfs.disable: on > features.inode-quota: on > features.quota: on > > And I ran the smallfile benchmark, created 80k 64KB files. After that I > clear cache everywhere and ran a smallfile stat test > > [root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files > 10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation > stat > version : 3.1 > hosts in test : None > top test directory(s) : ['/gluster-mount/s-file'] > operation : stat > files/thread : 10000 > threads : 8 > record size (KB, 0 = maximum) : 0 > file size (KB) : 64 > file size distribution : fixed > files per dir : 100 > dirs per dir : 10 > threads share directories? : N > filename prefix : > filename suffix : > hash file number into dir.? : N > fsync after modify? : N > pause between files (microsec) : 0 > finish all requests? : Y > stonewall? : Y > measure response times? : N > verify read? : Y > verbose? : False > log to stderr? : False > ext.attr.size : 0 > ext.attr.count : 0 > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed > 33.513184,files = 9400,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed > 33.322282,files = 9700,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed > 33.233768,files = 9600,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed > 33.145645,files = 10000,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed > 33.974151,files = 9600,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed > 33.220816,files = 9300,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed > 33.304850,files = 9900,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed > 33.482701,files = 9600,records = 0,status = ok > total threads = 8 > total files = 77100 > total IOPS = 0 > 96.38% of requested files processed, minimum is 90.00 > elapsed time = 33.974 > files/sec = 2269.372389 > > So I was only able to stat 2269 files / sec. Given this I think stating 1 > million files(at least on my config) should take about 440 seconds. I don't > know what kind of HW you are using but 50 seconds to stat 1 million files > seems faster than what I would think gluster can do. > > Using ls -l is a different story: > > [root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files > 10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation > ls-l > version : 3.1 > hosts in test : None > top test directory(s) : ['/gluster-mount/s-file'] > operation : ls-l > files/thread : 10000 > threads : 8 > record size (KB, 0 = maximum) : 0 > file size (KB) : 64 > file size distribution : fixed > files per dir : 100 > dirs per dir : 10 > threads share directories? : N > filename prefix : > filename suffix : > hash file number into dir.? : N > fsync after modify? : N > pause between files (microsec) : 0 > finish all requests? : Y > stonewall? : Y > measure response times? : N > verify read? : Y > verbose? : False > log to stderr? : False > ext.attr.size : 0 > ext.attr.count : 0 > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed > 2.867676,files = 9500,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed > 2.854434,files = 10000,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed > 2.864292,files = 9000,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed > 2.878101,files = 9600,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed > 2.883071,files = 9500,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed > 2.885429,files = 9600,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed > 2.865360,files = 9500,records = 0,status = ok > host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed > 2.866940,files = 9400,records = 0,status = ok > total threads = 8 > total files = 76100 > total IOPS = 0 > 95.12% of requested files processed, minimum is 90.00 > elapsed time = 2.885 > files/sec = 26373.893173 > > So here I get 26k files / sec. Based on 25k files / sec that works out to > about 40 seconds, much more like what you are seeing. Are you sure that > your BUP app isn't doing something like ls -l where as rsync is doing > something like stat? > > Maybe you could try the smallfile benchmark and see what it gives you. Also, > make sure you clear cache on both clients and servers(echo 3 > > /proc/sys/vm/drop_caches) before running your tests to eliminate server and > client side caching. If you are not dropping cache this could also cause > the differences you are seeing. > > -b > > ----- Original Message ----- > > From: "Niklas Hamb?chen" <mail at nh2.me> > > To: gluster-users at gluster.org > > Sent: Thursday, September 14, 2017 8:42:44 AM > > Subject: [Gluster-users] Confusing lstat() performance > > > > Hi, > > > > I have a gluster 3.10 volume with a dir with ~1 million small files in > > them, say mounted at /mnt/dir with FUSE, and I'm observing something weird: > > > > When I list and stat them all using rsync, then the lstat() calls that > > rsync does are incredibly fast (23 microseconds per call on average, > > definitely faster than a network roundtrip between my 3-machine bricks > > connected via Ethernet). > > But when I try to back them up with the `bup` tool > > (https://github.com/bup/bup), which (at least according to strace) does > > the same syscalls as rsync to stat all files, it takes 1700 microseconds > > per lstat() call, and the total time to go over all files is 50x higher. > > > > These are my strace statistics: > > > > rsync strace: > > > > $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting > > > > % time seconds usecs/call calls errors syscall > > ------ ----------- ----------- --------- --------- ---------------- > > 52.35 69.682773 26617 2618 1 select > > 39.60 52.712292 1056 49907 getdents > > 8.00 10.655949 11 998102 lstat > > 0.02 0.022890 12 1900 read > > 0.01 0.017219 21 829 write > > 0.01 0.012093 14 868 munmap > > 0.01 0.006656 11 606 mmap > > 0.00 0.004168 1389 3 readlink > > 0.00 0.001796 1796 1 chdir > > 0.00 0.001019 510 2 clone > > 0.00 0.000841 44 19 13 open > > > > Took ~50 seconds real time to complete. > > > > bup strace (I interrupted it after a while): > > > > strace: Process 10749 attached > > strace: Process 10750 attached > > Indexing: 25600 (566 paths/s) > > ^C > > % time seconds usecs/call calls errors syscall > > ------ ----------- ----------- --------- --------- ---------------- > > 89.55 1140.837325 1700 671016 lstat > > 3.92 49.875376 934 53387 getdents > > 3.58 45.655985 52238 874 read > > 2.14 27.293944 789 34588 5799 open > > 0.57 7.266342 141 51384 llistxattr > > 0.09 1.090689 42 25692 25692 getxattr > > 0.06 0.780977 26 29825 1019 close > > 0.05 0.601806 23 25739 25722 ioctl > > 0.03 0.373894 1851 202 select > > 0.00 0.055953 14 3879 brk > > > > real 20m52.150s > > user 0m2.608s > > sys 0m11.456s > > > > Note I passed `-c -w` to strace to measure wall time of the syscalls > > spend, not system CPU time. > > > > Using > > > > time strace -f -c -w ls -lU /mnt/dir > /dev/null > > > > shows that ls is as fast as rsync (also in 50 seconds). > > > > (Aside: I've filed the large number of getdents() as > > https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the > > core question of this email.) > > > > My volume info: > > > > # gluster vol info > > > > Volume Name: myvol > > Type: Replicate > > Volume ID: ... > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: 10.0.0.1:/data/glusterfs/brick > > Brick2: 10.0.0.2:/data/glusterfs/brick > > Brick3: 10.0.0.3:/data/glusterfs/brick > > Options Reconfigured: > > changelog.capture-del-path: on > > changelog.changelog: on > > storage.build-pgfid: on > > nfs.disable: on > > transport.address-family: inet > > client.ssl: on > > server.ssl: on > > storage.linux-aio: on > > performance.io-thread-count: 64 > > performance.readdir-ahead: on > > server.event-threads: 32 > > client.event-threads: 32 > > server.outstanding-rpc-limit: 64 > > cluster.lookup-unhashed: auto > > performance.flush-behind: on > > performance.strict-write-ordering: off > > performance.high-prio-threads: 64 > > performance.normal-prio-threads: 64 > > performance.low-prio-threads: 64 > > performance.write-behind-window-size: 10MB > > cluster.ensure-durability: on > > performance.lazy-open: yes > > cluster.use-compound-fops: off > > performance.open-behind: on > > features.cache-invalidation: off > > performance.quick-read: off > > performance.read-ahead: off > > performance.stat-prefetch: off > > changelog.rollover-time: 1 > > cluster.self-heal-daemon: enable > > > > Questions: > > > > What could explain why bup's lstat()s are slow and rsync's lstat()s are > > fast? > > > > Also, how comes rsync's lstat()s can be faster than a network roundtrip > > at all, given that I have all caching disabled (e.g. stat-prefetch: off, > > cache-invalidation: off)? > > > > Is there some caching going on? Might bup issue its syscalls in an order > > that might destroy this caching, while rsync and ls use a favourable order? > > > > Thanks! > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > >
Hi Ben, do you know if the smallfile benchmark also does interleaved getdents() and lstat, which is what I found as being the key difference that creates the performance gap (further down this thread)? Also, wouldn't `--threads 8` change the performance numbers by factor 8 versus the plain `ls` and `rsync` that I did? Would you mind running those commands directly/plainly on your cluster to confirm or refute my numbers? Thanks!