thr3ads.net - Gluster users - [Gluster-users] Confusing lstat() performance [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Niklas Hambüchen

2017-Sep-14 12:42 UTC

[Gluster-users] Confusing lstat() performance

Hi,

I have a gluster 3.10 volume with a dir with ~1 million small files in
them, say mounted at /mnt/dir with FUSE, and I'm observing something weird:

When I list and stat them all using rsync, then the lstat() calls that
rsync does are incredibly fast (23 microseconds per call on average,
definitely faster than a network roundtrip between my 3-machine bricks
connected via Ethernet).
But when I try to back them up with the `bup` tool
(https://github.com/bup/bup), which (at least according to strace) does
the same syscalls as rsync to stat all files, it takes 1700 microseconds
per lstat() call, and the total time to go over all files is 50x higher.

These are my strace statistics:

rsync strace:

  $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting

  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   52.35   69.682773       26617      2618         1 select
   39.60   52.712292        1056     49907           getdents
    8.00   10.655949          11    998102           lstat
    0.02    0.022890          12      1900           read
    0.01    0.017219          21       829           write
    0.01    0.012093          14       868           munmap
    0.01    0.006656          11       606           mmap
    0.00    0.004168        1389         3           readlink
    0.00    0.001796        1796         1           chdir
    0.00    0.001019         510         2           clone
    0.00    0.000841          44        19        13 open

Took ~50 seconds real time to complete.

bup strace (I interrupted it after a while):

  strace: Process 10749 attached
  strace: Process 10750 attached
  Indexing: 25600 (566 paths/s)
  ^C
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   89.55 1140.837325        1700    671016           lstat
    3.92   49.875376         934     53387           getdents
    3.58   45.655985       52238       874           read
    2.14   27.293944         789     34588      5799 open
    0.57    7.266342         141     51384           llistxattr
    0.09    1.090689          42     25692     25692 getxattr
    0.06    0.780977          26     29825      1019 close
    0.05    0.601806          23     25739     25722 ioctl
    0.03    0.373894        1851       202           select
    0.00    0.055953          14      3879           brk

  real    20m52.150s
  user    0m2.608s
  sys     0m11.456s

Note I passed `-c -w` to strace to measure wall time of the syscalls
spend, not system CPU time.

Using

  time strace -f -c -w ls -lU /mnt/dir > /dev/null

shows that ls is as fast as rsync (also in 50 seconds).

(Aside: I've filed the large number of getdents() as
https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the
core question of this email.)

My volume info:

    # gluster vol info

    Volume Name: myvol
    Type: Replicate
    Volume ID: ...
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 1 x 3 = 3
    Transport-type: tcp
    Bricks:
    Brick1: 10.0.0.1:/data/glusterfs/brick
    Brick2: 10.0.0.2:/data/glusterfs/brick
    Brick3: 10.0.0.3:/data/glusterfs/brick
    Options Reconfigured:
    changelog.capture-del-path: on
    changelog.changelog: on
    storage.build-pgfid: on
    nfs.disable: on
    transport.address-family: inet
    client.ssl: on
    server.ssl: on
    storage.linux-aio: on
    performance.io-thread-count: 64
    performance.readdir-ahead: on
    server.event-threads: 32
    client.event-threads: 32
    server.outstanding-rpc-limit: 64
    cluster.lookup-unhashed: auto
    performance.flush-behind: on
    performance.strict-write-ordering: off
    performance.high-prio-threads: 64
    performance.normal-prio-threads: 64
    performance.low-prio-threads: 64
    performance.write-behind-window-size: 10MB
    cluster.ensure-durability: on
    performance.lazy-open: yes
    cluster.use-compound-fops: off
    performance.open-behind: on
    features.cache-invalidation: off
    performance.quick-read: off
    performance.read-ahead: off
    performance.stat-prefetch: off
    changelog.rollover-time: 1
    cluster.self-heal-daemon: enable

Questions:

What could explain why bup's lstat()s are slow and rsync's lstat()s are
fast?

Also, how comes rsync's lstat()s can be faster than a network roundtrip
at all, given that I have all caching disabled (e.g. stat-prefetch: off,
cache-invalidation: off)?

Is there some caching going on? Might bup issue its syscalls in an order
that might destroy this caching, while rsync and ls use a favourable order?

Thanks!

Sam McLeod

2017-Sep-15 00:45 UTC

head link

[Gluster-users] Confusing lstat() performance

Hi Niklas,

Out of interest have you tried testing performance with
performance.stat-prefetch enabled?

--
Sam McLeod 
@s_mcleod
https://smcleod.net
> On 14 Sep 2017, at 10:42 pm, Niklas Hamb?chen <mail at nh2.me> wrote:
> 
> Hi,
> 
> I have a gluster 3.10 volume with a dir with ~1 million small files in
> them, say mounted at /mnt/dir with FUSE, and I'm observing something
weird:
> 
> When I list and stat them all using rsync, then the lstat() calls that
> rsync does are incredibly fast (23 microseconds per call on average,
> definitely faster than a network roundtrip between my 3-machine bricks
> connected via Ethernet).
> But when I try to back them up with the `bup` tool
> (https://github.com/bup/bup), which (at least according to strace) does
> the same syscalls as rsync to stat all files, it takes 1700 microseconds
> per lstat() call, and the total time to go over all files is 50x higher.
> 
> These are my strace statistics:
> 
> rsync strace:
> 
>  $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting
> 
>  % time     seconds  usecs/call     calls    errors syscall
>  ------ ----------- ----------- --------- --------- ----------------
>   52.35   69.682773       26617      2618         1 select
>   39.60   52.712292        1056     49907           getdents
>    8.00   10.655949          11    998102           lstat
>    0.02    0.022890          12      1900           read
>    0.01    0.017219          21       829           write
>    0.01    0.012093          14       868           munmap
>    0.01    0.006656          11       606           mmap
>    0.00    0.004168        1389         3           readlink
>    0.00    0.001796        1796         1           chdir
>    0.00    0.001019         510         2           clone
>    0.00    0.000841          44        19        13 open
> 
> Took ~50 seconds real time to complete.
> 
> bup strace (I interrupted it after a while):
> 
>  strace: Process 10749 attached
>  strace: Process 10750 attached
>  Indexing: 25600 (566 paths/s)
>  ^C
>  % time     seconds  usecs/call     calls    errors syscall
>  ------ ----------- ----------- --------- --------- ----------------
>   89.55 1140.837325        1700    671016           lstat
>    3.92   49.875376         934     53387           getdents
>    3.58   45.655985       52238       874           read
>    2.14   27.293944         789     34588      5799 open
>    0.57    7.266342         141     51384           llistxattr
>    0.09    1.090689          42     25692     25692 getxattr
>    0.06    0.780977          26     29825      1019 close
>    0.05    0.601806          23     25739     25722 ioctl
>    0.03    0.373894        1851       202           select
>    0.00    0.055953          14      3879           brk
> 
>  real    20m52.150s
>  user    0m2.608s
>  sys     0m11.456s
> 
> Note I passed `-c -w` to strace to measure wall time of the syscalls
> spend, not system CPU time.
> 
> Using
> 
>  time strace -f -c -w ls -lU /mnt/dir > /dev/null
> 
> shows that ls is as fast as rsync (also in 50 seconds).
> 
> (Aside: I've filed the large number of getdents() as
> https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the
> core question of this email.)
> 
> My volume info:
> 
>    # gluster vol info
> 
>    Volume Name: myvol
>    Type: Replicate
>    Volume ID: ...
>    Status: Started
>    Snapshot Count: 0
>    Number of Bricks: 1 x 3 = 3
>    Transport-type: tcp
>    Bricks:
>    Brick1: 10.0.0.1:/data/glusterfs/brick
>    Brick2: 10.0.0.2:/data/glusterfs/brick
>    Brick3: 10.0.0.3:/data/glusterfs/brick
>    Options Reconfigured:
>    changelog.capture-del-path: on
>    changelog.changelog: on
>    storage.build-pgfid: on
>    nfs.disable: on
>    transport.address-family: inet
>    client.ssl: on
>    server.ssl: on
>    storage.linux-aio: on
>    performance.io-thread-count: 64
>    performance.readdir-ahead: on
>    server.event-threads: 32
>    client.event-threads: 32
>    server.outstanding-rpc-limit: 64
>    cluster.lookup-unhashed: auto
>    performance.flush-behind: on
>    performance.strict-write-ordering: off
>    performance.high-prio-threads: 64
>    performance.normal-prio-threads: 64
>    performance.low-prio-threads: 64
>    performance.write-behind-window-size: 10MB
>    cluster.ensure-durability: on
>    performance.lazy-open: yes
>    cluster.use-compound-fops: off
>    performance.open-behind: on
>    features.cache-invalidation: off
>    performance.quick-read: off
>    performance.read-ahead: off
>    performance.stat-prefetch: off
>    changelog.rollover-time: 1
>    cluster.self-heal-daemon: enable
> 
> Questions:
> 
> What could explain why bup's lstat()s are slow and rsync's lstat()s
are
> fast?
> 
> Also, how comes rsync's lstat()s can be faster than a network roundtrip
> at all, given that I have all caching disabled (e.g. stat-prefetch: off,
> cache-invalidation: off)?
> 
> Is there some caching going on? Might bup issue its syscalls in an order
> that might destroy this caching, while rsync and ls use a favourable order?
> 
> Thanks!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170915/c8d15828/attachment.html>

Niklas Hambüchen

2017-Sep-15 01:46 UTC

head link

[Gluster-users] Confusing lstat() performance

On 15/09/17 02:45, Sam McLeod wrote:> Out of interest have you tried testing performance
> with performance.stat-prefetch enabled?
Not yet, because I'm still struggling to understand the current more
basic setup's performance behaviour (with it being off), but it's
definitely on my list and I'll report the outcome.

Ben Turner

2017-Sep-18 00:54 UTC

head link

[Gluster-users] Confusing lstat() performance

I did a quick test on one of my lab clusters with no tuning except for quota
being enabled:

[root at dell-per730-03 ~]# gluster v info
 
Volume Name: vmstore
Type: Replicate
Volume ID: 0d2e4c49-334b-47c9-8e72-86a4c040a7bd
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.50.1:/rhgs/brick1/vmstore
Brick2: 192.168.50.2:/rhgs/brick1/vmstore
Brick3: 192.168.50.3:/rhgs/ssd/vmstore (arbiter)
Options Reconfigured:
features.quota-deem-statfs: on
nfs.disable: on
features.inode-quota: on
features.quota: on

And I ran the smallfile benchmark, created 80k 64KB files.  After that I clear
cache everywhere and ran a smallfile stat test

[root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files
10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation stat
                                 version : 3.1
                           hosts in test : None
                   top test directory(s) : ['/gluster-mount/s-file']
                               operation : stat
                            files/thread : 10000
                                 threads : 8
           record size (KB, 0 = maximum) : 0
                          file size (KB) : 64
                  file size distribution : fixed
                           files per dir : 100
                            dirs per dir : 10
              threads share directories? : N
                         filename prefix : 
                         filename suffix : 
             hash file number into dir.? : N
                     fsync after modify? : N
          pause between files (microsec) : 0
                    finish all requests? : Y
                              stonewall? : Y
                 measure response times? : N
                            verify read? : Y
                                verbose? : False
                          log to stderr? : False
                           ext.attr.size : 0
                          ext.attr.count : 0
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed =
33.513184,files = 9400,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed =
33.322282,files = 9700,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed =
33.233768,files = 9600,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed =
33.145645,files = 10000,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed =
33.974151,files = 9600,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed =
33.220816,files = 9300,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed =
33.304850,files = 9900,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed =
33.482701,files = 9600,records = 0,status = ok
total threads = 8
total files = 77100
total IOPS = 0
 96.38% of requested files processed, minimum is  90.00
elapsed time =    33.974
files/sec = 2269.372389

So I was only able to stat 2269 files / sec.  Given this I think stating 1
million files(at least on my config) should take about 440 seconds.  I don't
know what kind of HW you are using but 50 seconds to stat 1 million files seems
faster than what I would think gluster can do.

Using ls -l is a different story:

[root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files
10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation ls-l
                                 version : 3.1
                           hosts in test : None
                   top test directory(s) : ['/gluster-mount/s-file']
                               operation : ls-l
                            files/thread : 10000
                                 threads : 8
           record size (KB, 0 = maximum) : 0
                          file size (KB) : 64
                  file size distribution : fixed
                           files per dir : 100
                            dirs per dir : 10
              threads share directories? : N
                         filename prefix : 
                         filename suffix : 
             hash file number into dir.? : N
                     fsync after modify? : N
          pause between files (microsec) : 0
                    finish all requests? : Y
                              stonewall? : Y
                 measure response times? : N
                            verify read? : Y
                                verbose? : False
                          log to stderr? : False
                           ext.attr.size : 0
                          ext.attr.count : 0
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed =
2.867676,files = 9500,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed =
2.854434,files = 10000,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed =
2.864292,files = 9000,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed =
2.878101,files = 9600,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed =
2.883071,files = 9500,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed =
2.885429,files = 9600,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed =
2.865360,files = 9500,records = 0,status = ok
host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed =
2.866940,files = 9400,records = 0,status = ok
total threads = 8
total files = 76100
total IOPS = 0
 95.12% of requested files processed, minimum is  90.00
elapsed time =     2.885
files/sec = 26373.893173

So here I get 26k files / sec.  Based on 25k files / sec that works out to about
40 seconds, much more like what you are seeing.  Are you sure that your BUP app
isn't doing something like ls -l where as rsync is doing something like
stat?

Maybe you could try the smallfile benchmark and see what it gives you.  Also,
make sure you clear cache on both clients and servers(echo 3 >
/proc/sys/vm/drop_caches) before running your tests to eliminate server and
client side caching.  If you are not dropping cache this could also cause the
differences you are seeing.

-b

----- Original Message -----> From: "Niklas Hamb?chen" <mail at nh2.me>
> To: gluster-users at gluster.org
> Sent: Thursday, September 14, 2017 8:42:44 AM
> Subject: [Gluster-users] Confusing lstat() performance
> 
> Hi,
> 
> I have a gluster 3.10 volume with a dir with ~1 million small files in
> them, say mounted at /mnt/dir with FUSE, and I'm observing something
weird:
> 
> When I list and stat them all using rsync, then the lstat() calls that
> rsync does are incredibly fast (23 microseconds per call on average,
> definitely faster than a network roundtrip between my 3-machine bricks
> connected via Ethernet).
> But when I try to back them up with the `bup` tool
> (https://github.com/bup/bup), which (at least according to strace) does
> the same syscalls as rsync to stat all files, it takes 1700 microseconds
> per lstat() call, and the total time to go over all files is 50x higher.
> 
> These are my strace statistics:
> 
> rsync strace:
> 
>   $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting
> 
>   % time     seconds  usecs/call     calls    errors syscall
>   ------ ----------- ----------- --------- --------- ----------------
>    52.35   69.682773       26617      2618         1 select
>    39.60   52.712292        1056     49907           getdents
>     8.00   10.655949          11    998102           lstat
>     0.02    0.022890          12      1900           read
>     0.01    0.017219          21       829           write
>     0.01    0.012093          14       868           munmap
>     0.01    0.006656          11       606           mmap
>     0.00    0.004168        1389         3           readlink
>     0.00    0.001796        1796         1           chdir
>     0.00    0.001019         510         2           clone
>     0.00    0.000841          44        19        13 open
> 
> Took ~50 seconds real time to complete.
> 
> bup strace (I interrupted it after a while):
> 
>   strace: Process 10749 attached
>   strace: Process 10750 attached
>   Indexing: 25600 (566 paths/s)
>   ^C
>   % time     seconds  usecs/call     calls    errors syscall
>   ------ ----------- ----------- --------- --------- ----------------
>    89.55 1140.837325        1700    671016           lstat
>     3.92   49.875376         934     53387           getdents
>     3.58   45.655985       52238       874           read
>     2.14   27.293944         789     34588      5799 open
>     0.57    7.266342         141     51384           llistxattr
>     0.09    1.090689          42     25692     25692 getxattr
>     0.06    0.780977          26     29825      1019 close
>     0.05    0.601806          23     25739     25722 ioctl
>     0.03    0.373894        1851       202           select
>     0.00    0.055953          14      3879           brk
> 
>   real    20m52.150s
>   user    0m2.608s
>   sys     0m11.456s
> 
> Note I passed `-c -w` to strace to measure wall time of the syscalls
> spend, not system CPU time.
> 
> Using
> 
>   time strace -f -c -w ls -lU /mnt/dir > /dev/null
> 
> shows that ls is as fast as rsync (also in 50 seconds).
> 
> (Aside: I've filed the large number of getdents() as
> https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the
> core question of this email.)
> 
> My volume info:
> 
>     # gluster vol info
> 
>     Volume Name: myvol
>     Type: Replicate
>     Volume ID: ...
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 1 x 3 = 3
>     Transport-type: tcp
>     Bricks:
>     Brick1: 10.0.0.1:/data/glusterfs/brick
>     Brick2: 10.0.0.2:/data/glusterfs/brick
>     Brick3: 10.0.0.3:/data/glusterfs/brick
>     Options Reconfigured:
>     changelog.capture-del-path: on
>     changelog.changelog: on
>     storage.build-pgfid: on
>     nfs.disable: on
>     transport.address-family: inet
>     client.ssl: on
>     server.ssl: on
>     storage.linux-aio: on
>     performance.io-thread-count: 64
>     performance.readdir-ahead: on
>     server.event-threads: 32
>     client.event-threads: 32
>     server.outstanding-rpc-limit: 64
>     cluster.lookup-unhashed: auto
>     performance.flush-behind: on
>     performance.strict-write-ordering: off
>     performance.high-prio-threads: 64
>     performance.normal-prio-threads: 64
>     performance.low-prio-threads: 64
>     performance.write-behind-window-size: 10MB
>     cluster.ensure-durability: on
>     performance.lazy-open: yes
>     cluster.use-compound-fops: off
>     performance.open-behind: on
>     features.cache-invalidation: off
>     performance.quick-read: off
>     performance.read-ahead: off
>     performance.stat-prefetch: off
>     changelog.rollover-time: 1
>     cluster.self-heal-daemon: enable
> 
> Questions:
> 
> What could explain why bup's lstat()s are slow and rsync's lstat()s
are
> fast?
> 
> Also, how comes rsync's lstat()s can be faster than a network roundtrip
> at all, given that I have all caching disabled (e.g. stat-prefetch: off,
> cache-invalidation: off)?
> 
> Is there some caching going on? Might bup issue its syscalls in an order
> that might destroy this caching, while rsync and ls use a favourable order?
> 
> Thanks!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

Ben Turner

2017-Sep-18 01:08 UTC

head link

[Gluster-users] Confusing lstat() performance

I attached my strace output for you to look at:

Smallfile stat:
files/sec = 2270.307299
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 84.48  272.324412        3351     81274      1141 stat
 10.20   32.880871       81997       401           read
  2.78    8.957013       97359        92           write
  2.44    7.854060      436337        18           select
  0.03    0.085606          67      1277       984 open
  0.03    0.082586       13764         6           getdents
  0.02    0.048819        4882        10           unlink
  0.01    0.046833       46833         1           mkdir
  0.01    0.027127       27127         1           rmdir
  0.00    0.012896       12896         1           fsync
  0.00    0.003129           6       531           fstat
  0.00    0.002766          66        42         2 wait4

Smallfile ls -l:
files/sec = 22852.145303
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.26   18.558147        3849      4822           getdents
 26.96    8.302826      461268        18           select
  4.92    1.515195        1868       811           openat
  4.83    1.486988          18     81294      1161 stat
  2.23    0.686146        7458        92           write
  0.26    0.080318        8032        10           unlink
  0.17    0.050832          40      1277       984 open
  0.10    0.030263       30263         1           rmdir
  0.08    0.023776       23776         1           fsync
  0.05    0.016408          41       401           read
  0.05    0.016061       16061         1           mkdir
  0.04    0.011154          10      1108           close
  0.01    0.003229           6       531           fstat
  0.01    0.002840          68        42         2 wait4

Look at the difference between my two stat calls:

 84.48  272.324412        3351     81274      1141 stat   <--- stat

  4.83    1.486988          18     81294      1161 stat   <--- ls -l

Maybe you two applications are behaving differently, like smallfile stat and
smallfile ls-l are?

-b

----- Original Message -----> From: "Ben Turner" <bturner at redhat.com>
> To: "Niklas Hamb?chen" <mail at nh2.me>
> Cc: gluster-users at gluster.org
> Sent: Sunday, September 17, 2017 8:54:01 PM
> Subject: Re: [Gluster-users] Confusing lstat() performance
> 
> I did a quick test on one of my lab clusters with no tuning except for
quota
> being enabled:
> 
> [root at dell-per730-03 ~]# gluster v info
>  
> Volume Name: vmstore
> Type: Replicate
> Volume ID: 0d2e4c49-334b-47c9-8e72-86a4c040a7bd
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.50.1:/rhgs/brick1/vmstore
> Brick2: 192.168.50.2:/rhgs/brick1/vmstore
> Brick3: 192.168.50.3:/rhgs/ssd/vmstore (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> nfs.disable: on
> features.inode-quota: on
> features.quota: on
> 
> And I ran the smallfile benchmark, created 80k 64KB files.  After that I
> clear cache everywhere and ran a smallfile stat test
> 
> [root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files
> 10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation
> stat
>                                  version : 3.1
>                            hosts in test : None
>                    top test directory(s) :
['/gluster-mount/s-file']
>                                operation : stat
>                             files/thread : 10000
>                                  threads : 8
>            record size (KB, 0 = maximum) : 0
>                           file size (KB) : 64
>                   file size distribution : fixed
>                            files per dir : 100
>                             dirs per dir : 10
>               threads share directories? : N
>                          filename prefix :
>                          filename suffix :
>              hash file number into dir.? : N
>                      fsync after modify? : N
>           pause between files (microsec) : 0
>                     finish all requests? : Y
>                               stonewall? : Y
>                  measure response times? : N
>                             verify read? : Y
>                                 verbose? : False
>                           log to stderr? : False
>                            ext.attr.size : 0
>                           ext.attr.count : 0
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed
> 33.513184,files = 9400,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed
> 33.322282,files = 9700,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed
> 33.233768,files = 9600,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed
> 33.145645,files = 10000,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed
> 33.974151,files = 9600,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed
> 33.220816,files = 9300,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed
> 33.304850,files = 9900,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed
> 33.482701,files = 9600,records = 0,status = ok
> total threads = 8
> total files = 77100
> total IOPS = 0
>  96.38% of requested files processed, minimum is  90.00
> elapsed time =    33.974
> files/sec = 2269.372389
> 
> So I was only able to stat 2269 files / sec.  Given this I think stating 1
> million files(at least on my config) should take about 440 seconds.  I
don't
> know what kind of HW you are using but 50 seconds to stat 1 million files
> seems faster than what I would think gluster can do.
> 
> Using ls -l is a different story:
> 
> [root at dell-per730-06-priv ~]# python /smallfile/smallfile_cli.py --files
> 10000 --file-size 64 --threads 8 --top /gluster-mount/s-file/ --operation
> ls-l
>                                  version : 3.1
>                            hosts in test : None
>                    top test directory(s) :
['/gluster-mount/s-file']
>                                operation : ls-l
>                             files/thread : 10000
>                                  threads : 8
>            record size (KB, 0 = maximum) : 0
>                           file size (KB) : 64
>                   file size distribution : fixed
>                            files per dir : 100
>                             dirs per dir : 10
>               threads share directories? : N
>                          filename prefix :
>                          filename suffix :
>              hash file number into dir.? : N
>                      fsync after modify? : N
>           pause between files (microsec) : 0
>                     finish all requests? : Y
>                               stonewall? : Y
>                  measure response times? : N
>                             verify read? : Y
>                                 verbose? : False
>                           log to stderr? : False
>                            ext.attr.size : 0
>                           ext.attr.count : 0
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 00,elapsed
> 2.867676,files = 9500,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 01,elapsed
> 2.854434,files = 10000,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 02,elapsed
> 2.864292,files = 9000,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 03,elapsed
> 2.878101,files = 9600,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 04,elapsed
> 2.883071,files = 9500,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 05,elapsed
> 2.885429,files = 9600,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 06,elapsed
> 2.865360,files = 9500,records = 0,status = ok
> host = dell-per730-06-priv.css.lab.eng.rdu2.redhat.com,thr = 07,elapsed
> 2.866940,files = 9400,records = 0,status = ok
> total threads = 8
> total files = 76100
> total IOPS = 0
>  95.12% of requested files processed, minimum is  90.00
> elapsed time =     2.885
> files/sec = 26373.893173
> 
> So here I get 26k files / sec.  Based on 25k files / sec that works out to
> about 40 seconds, much more like what you are seeing.  Are you sure that
> your BUP app isn't doing something like ls -l where as rsync is doing
> something like stat?
> 
> Maybe you could try the smallfile benchmark and see what it gives you. 
Also,
> make sure you clear cache on both clients and servers(echo 3 >
> /proc/sys/vm/drop_caches) before running your tests to eliminate server and
> client side caching.  If you are not dropping cache this could also cause
> the differences you are seeing.
> 
> -b
> 
> ----- Original Message -----
> > From: "Niklas Hamb?chen" <mail at nh2.me>
> > To: gluster-users at gluster.org
> > Sent: Thursday, September 14, 2017 8:42:44 AM
> > Subject: [Gluster-users] Confusing lstat() performance
> > 
> > Hi,
> > 
> > I have a gluster 3.10 volume with a dir with ~1 million small files in
> > them, say mounted at /mnt/dir with FUSE, and I'm observing
something weird:
> > 
> > When I list and stat them all using rsync, then the lstat() calls that
> > rsync does are incredibly fast (23 microseconds per call on average,
> > definitely faster than a network roundtrip between my 3-machine bricks
> > connected via Ethernet).
> > But when I try to back them up with the `bup` tool
> > (https://github.com/bup/bup), which (at least according to strace)
does
> > the same syscalls as rsync to stat all files, it takes 1700
microseconds
> > per lstat() call, and the total time to go over all files is 50x
higher.
> > 
> > These are my strace statistics:
> > 
> > rsync strace:
> > 
> >   $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting
> > 
> >   % time     seconds  usecs/call     calls    errors syscall
> >   ------ ----------- ----------- --------- --------- ----------------
> >    52.35   69.682773       26617      2618         1 select
> >    39.60   52.712292        1056     49907           getdents
> >     8.00   10.655949          11    998102           lstat
> >     0.02    0.022890          12      1900           read
> >     0.01    0.017219          21       829           write
> >     0.01    0.012093          14       868           munmap
> >     0.01    0.006656          11       606           mmap
> >     0.00    0.004168        1389         3           readlink
> >     0.00    0.001796        1796         1           chdir
> >     0.00    0.001019         510         2           clone
> >     0.00    0.000841          44        19        13 open
> > 
> > Took ~50 seconds real time to complete.
> > 
> > bup strace (I interrupted it after a while):
> > 
> >   strace: Process 10749 attached
> >   strace: Process 10750 attached
> >   Indexing: 25600 (566 paths/s)
> >   ^C
> >   % time     seconds  usecs/call     calls    errors syscall
> >   ------ ----------- ----------- --------- --------- ----------------
> >    89.55 1140.837325        1700    671016           lstat
> >     3.92   49.875376         934     53387           getdents
> >     3.58   45.655985       52238       874           read
> >     2.14   27.293944         789     34588      5799 open
> >     0.57    7.266342         141     51384           llistxattr
> >     0.09    1.090689          42     25692     25692 getxattr
> >     0.06    0.780977          26     29825      1019 close
> >     0.05    0.601806          23     25739     25722 ioctl
> >     0.03    0.373894        1851       202           select
> >     0.00    0.055953          14      3879           brk
> > 
> >   real    20m52.150s
> >   user    0m2.608s
> >   sys     0m11.456s
> > 
> > Note I passed `-c -w` to strace to measure wall time of the syscalls
> > spend, not system CPU time.
> > 
> > Using
> > 
> >   time strace -f -c -w ls -lU /mnt/dir > /dev/null
> > 
> > shows that ls is as fast as rsync (also in 50 seconds).
> > 
> > (Aside: I've filed the large number of getdents() as
> > https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not
the
> > core question of this email.)
> > 
> > My volume info:
> > 
> >     # gluster vol info
> > 
> >     Volume Name: myvol
> >     Type: Replicate
> >     Volume ID: ...
> >     Status: Started
> >     Snapshot Count: 0
> >     Number of Bricks: 1 x 3 = 3
> >     Transport-type: tcp
> >     Bricks:
> >     Brick1: 10.0.0.1:/data/glusterfs/brick
> >     Brick2: 10.0.0.2:/data/glusterfs/brick
> >     Brick3: 10.0.0.3:/data/glusterfs/brick
> >     Options Reconfigured:
> >     changelog.capture-del-path: on
> >     changelog.changelog: on
> >     storage.build-pgfid: on
> >     nfs.disable: on
> >     transport.address-family: inet
> >     client.ssl: on
> >     server.ssl: on
> >     storage.linux-aio: on
> >     performance.io-thread-count: 64
> >     performance.readdir-ahead: on
> >     server.event-threads: 32
> >     client.event-threads: 32
> >     server.outstanding-rpc-limit: 64
> >     cluster.lookup-unhashed: auto
> >     performance.flush-behind: on
> >     performance.strict-write-ordering: off
> >     performance.high-prio-threads: 64
> >     performance.normal-prio-threads: 64
> >     performance.low-prio-threads: 64
> >     performance.write-behind-window-size: 10MB
> >     cluster.ensure-durability: on
> >     performance.lazy-open: yes
> >     cluster.use-compound-fops: off
> >     performance.open-behind: on
> >     features.cache-invalidation: off
> >     performance.quick-read: off
> >     performance.read-ahead: off
> >     performance.stat-prefetch: off
> >     changelog.rollover-time: 1
> >     cluster.self-heal-daemon: enable
> > 
> > Questions:
> > 
> > What could explain why bup's lstat()s are slow and rsync's
lstat()s are
> > fast?
> > 
> > Also, how comes rsync's lstat()s can be faster than a network
roundtrip
> > at all, given that I have all caching disabled (e.g. stat-prefetch:
off,
> > cache-invalidation: off)?
> > 
> > Is there some caching going on? Might bup issue its syscalls in an
order
> > that might destroy this caching, while rsync and ls use a favourable
order?
> > 
> > Thanks!
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> > 
>

Niklas Hambüchen

2017-Sep-18 01:49 UTC

head link

[Gluster-users] Confusing lstat() performance

Hi Ben,

do you know if the smallfile benchmark also does interleaved getdents()
and lstat, which is what I found as being the key difference that
creates the performance gap (further down this thread)?

Also, wouldn't `--threads 8` change the performance numbers by factor 8
versus the plain `ls` and `rsync` that I did?

Would you mind running those commands directly/plainly on your cluster
to confirm or refute my numbers?

Thanks!

Maybe Matching Threads

Search for more reasonably related threads

Gluster users - Sep 2017 - Confusing lstat() performance

[Gluster-users] Confusing lstat() performance

[Gluster-users] Confusing lstat() performance

[Gluster-users] Confusing lstat() performance

[Gluster-users] Confusing lstat() performance

[Gluster-users] Confusing lstat() performance

[Gluster-users] Confusing lstat() performance

Maybe Matching Threads