thr3ads.net - Gluster users - [Gluster-users] Extremely slow du [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Vijay Bellur

2017-Jul-11 15:22 UTC

[Gluster-users] Extremely slow du

Hi Kashif,

Thank you for your feedback! Do you have some data on the nature of 
performance improvement observed with 3.11 in the new setup?

Adding Raghavendra and Poornima for validation of configuration and help 
with identifying why certain files disappeared from the mount point 
after enabling readdir-optimize.

Regards,
Vijay

On 07/11/2017 11:06 AM, mohammad kashif wrote:> Hi Vijay and Experts
>
> I didn't want to experiment with my production setup so started  a
> parallel system with two server and around 80TB storage.  First
> configured with gluster 3.8 and had the same lookup performance issue.
> Then upgraded to 3.11 as you suggested and it made huge improvement in
> lookup time. I also did some more optimization as suggested in other
> threads.
> Now I am going to update my production server. I am planning to use
> following  optimization option, it would be very useful if you can point
> out any inconsistency or suggest some other options. My production setup
> has 5 servers consisting of  400TB storage and around 80 million files
> of varying lengths.
>
> Options Reconfigured:
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> cluster.readdir-optimize: off
> performance.client-io-threads: on
> performance.cache-size: 1GB
> performance.parallel-readdir: on
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> auth.allow: 163.1.136.*
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
>
> I found that setting cluster.readdir-optimize to 'on' made some
files
> disappear from client !
>
> Thanks
>
> Kashif
>
>
>
> On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur at redhat.com
> <mailto:vbellur at redhat.com>> wrote:
>
>     Hi Mohammad,
>
>     A lot of time is being spent in addressing metadata calls as
>     expected. Can you consider testing out with 3.11 with md-cache [1]
>     and readdirp [2] improvements?
>
>     Adding Poornima and Raghavendra who worked on these enhancements to
>     help out further.
>
>     Thanks,
>     Vijay
>
>     [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
>     <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>
>
>     [2] https://github.com/gluster/glusterfs/issues/166
>     <https://github.com/gluster/glusterfs/issues/166>
>
>     On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
>     <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>> wrote:
>
>         Hi Vijay
>
>         Did you manage to look into the gluster profile logs ?
>
>         Thanks
>
>         Kashif
>
>         On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
>         <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>> wrote:
>
>             Hi Vijay
>
>             I have enabled client profiling and used this script
>            
https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh
>            
<https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh>
>             to extract data. I am attaching output files. I don't have
>             any reference data to compare with my output. Hopefully you
>             can make some sense out of it.
>
>             On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
>             <vbellur at redhat.com <mailto:vbellur at
redhat.com>> wrote:
>
>                 Would it be possible for you to turn on client profiling
>                 and then run du? Instructions for turning on client
>                 profiling can be found at [1]. Providing the client
>                 profile information can help us figure out where the
>                 latency could be stemming from.
>
>                 Regards,
>                 Vijay
>
>                 [1]
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling
>                
<https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling>
>
>                 On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
>                 <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>>
>                 wrote:
>
>                     Hi Vijay
>
>                     Thanks for your quick response. I am using gluster
>                     3.8.11 on  Centos 7 servers
>                     glusterfs-3.8.11-1.el7.x86_64
>
>                     clients are centos 6 but I tested with a centos 7
>                     client as well and results didn't change
>
>                     gluster volume info Volume Name: atlasglust
>                     Type: Distribute
>                     Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>                     Status: Started
>                     Snapshot Count: 0
>                     Number of Bricks: 5
>                     Transport-type: tcp
>                     Bricks:
>                     Brick1: pplxgluster01.x.y.z:/glusteratlas/brick001/gv0
>                     Brick2: pplxgluster02..x.y.z:/glusteratlas/brick002/gv0
>                     Brick3: pplxgluster03.x.y.z:/glusteratlas/brick003/gv0
>                     Brick4: pplxgluster04.x.y.z:/glusteratlas/brick004/gv0
>                     Brick5: pplxgluster05.x.y.z:/glusteratlas/brick005/gv0
>                     Options Reconfigured:
>                     nfs.disable: on
>                     performance.readdir-ahead: on
>                     transport.address-family: inet
>                     auth.allow: x.y.z
>
>                     I am not using directory quota.
>
>                     Please let me know if you require some more info
>
>                     Thanks
>
>                     Kashif
>
>
>
>                     On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
>                     <vbellur at redhat.com <mailto:vbellur at
redhat.com>> wrote:
>
>                         Can you please provide more details about your
>                         volume configuration and the version of gluster
>                         that you are using?
>
>                         Regards,
>                         Vijay
>
>                         On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
>                         <kashif.alig at gmail.com
>                         <mailto:kashif.alig at gmail.com>> wrote:
>
>                             Hi
>
>                             I have just moved our 400 TB HPC storage
>                             from lustre to gluster. It is part of a
>                             research institute and users have very small
>                             files to  big files ( few KB to 20GB) . Our
>                             setup consists of 5 servers, each with 96TB
>                             RAID 6 disks. All servers are connected
>                             through 10G ethernet but not all clients.
>                             Gluster volumes are distributed without any
>                             replication. There are approximately 80
>                             million files in file system.
>                             I am mounting using glusterfs on  clients.
>
>                             I have copied everything from lustre to
>                             gluster but old file system exist so I can
>                             compare.
>
>                             The problem, I am facing is extremely slow
>                             du on even a small directory. Also the time
>                             taken is substantially different each time.
>                             I tried du from same client on  a particular
>                             directory twice and got these results.
>
>                             time du -sh /data/aa/bb/cc
>                             3.7G /data/aa/bb/cc
>                             real 7m29.243s
>                             user 0m1.448s
>                             sys 0m7.067s
>
>                             time du -sh /data/aa/bb/cc
>                             3.7G      /data/aa/bb/cc
>                             real 16m43.735s
>                             user 0m1.097s
>                             sys 0m5.802s
>
>                             16m and 7m is too long for a 3.7 G
>                             directory. I must mention that the directory
>                             contains huge number of files (208736)
>
>                             but running du on same directory on old data
>                             gives this result
>
>                             time du -sh /olddata/aa/bb/cc
>                             4.0G /olddata/aa/bb/cc
>                             real 3m1.255s
>                             user 0m0.755s
>                             sys 0m38.099s
>
>                             much better if I run same command again
>
>                             time du -sh /olddata/aa/bb/cc
>                             4.0G /olddata/aa/bb/cc
>                             real 0m8.309s
>                             user 0m0.313s
>                             sys 0m7.755s
>
>                             Is there anything I can do to improve this
>                             performance? I would also like hear from
>                             some one who is running same kind of setup.
>
>                             Thanks
>
>                             Kashif
>
>
>
>                             _______________________________________________
>                             Gluster-users mailing list
>                             Gluster-users at gluster.org
>                             <mailto:Gluster-users at gluster.org>
>                            
http://lists.gluster.org/mailman/listinfo/gluster-users
>                            
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
>
>
>
>

Jo Goossens

2017-Jul-11 15:28 UTC

head link

[Gluster-users] Extremely slow du

Hi,

?
?
I also noticed disappearing of files with the combination of certain settings.
If you use cluster.readdir-optimize but not some of the other settings, they
don't disappear.

?
Unfortunately can't remember which setting was conflicting...

?
?
Performance wise I don't see difference between 3.10 and 3.11 over here.
Didn't test recently with 3.8.



Regards

Jo

?

?
-----Original message-----
From:Vijay Bellur <vbellur at redhat.com>
Sent:Tue 11-07-2017 17:22
Subject:Re: [Gluster-users] Extremely slow du
To:mohammad kashif <kashif.alig at gmail.com>; Raghavendra Gowdappa
<rgowdapp at redhat.com>; Poornima Gurusiddaiah <pgurusid at
redhat.com>;
CC:gluster-users Discussion List <Gluster-users at gluster.org>; 
Hi Kashif,

Thank you for your feedback! Do you have some data on the nature of 
performance improvement observed with 3.11 in the new setup?

Adding Raghavendra and Poornima for validation of configuration and help 
with identifying why certain files disappeared from the mount point 
after enabling readdir-optimize.

Regards,
Vijay

On 07/11/2017 11:06 AM, mohammad kashif wrote:> Hi Vijay and Experts
>
> I didn't want to experiment with my production setup so started ?a
> parallel system with two server and around 80TB storage. ?First
> configured with gluster 3.8 and had the same lookup performance issue.
> Then upgraded to 3.11 as you suggested and it made huge improvement in
> lookup time. I also did some more optimization as suggested in other
> threads.
> Now I am going to update my production server. I am planning to use
> following ?optimization option, it would be very useful if you can point
> out any inconsistency or suggest some other options. My production setup
> has 5 servers consisting of ?400TB storage and around 80 million files
> of varying lengths.
>
> Options Reconfigured:
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> cluster.readdir-optimize: off
> performance.client-io-threads: on
> performance.cache-size: 1GB
> performance.parallel-readdir: on
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> auth.allow: 163.1.136.*
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
>
> I found that setting cluster.readdir-optimize to 'on' made some
files
> disappear from client !
>
> Thanks
>
> Kashif
>
>
>
> On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur at redhat.com
> <mailto:vbellur at redhat.com>> wrote:
>
> ? ? Hi Mohammad,
>
> ? ? A lot of time is being spent in addressing metadata calls as
> ? ? expected. Can you consider testing out with 3.11 with md-cache [1]
> ? ? and readdirp [2] improvements?
>
> ? ? Adding Poornima and Raghavendra who worked on these enhancements to
> ? ? help out further.
>
> ? ? Thanks,
> ? ? Vijay
>
> ? ? [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
> ? ? <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>
>
> ? ? [2] https://github.com/gluster/glusterfs/issues/166
> ? ? <https://github.com/gluster/glusterfs/issues/166>
>
> ? ? On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
> ? ? <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>> wrote:
>
> ? ? ? ? Hi Vijay
>
> ? ? ? ? Did you manage to look into the gluster profile logs ?
>
> ? ? ? ? Thanks
>
> ? ? ? ? Kashif
>
> ? ? ? ? On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
> ? ? ? ? <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>> wrote:
>
> ? ? ? ? ? ? Hi Vijay
>
> ? ? ? ? ? ? I have enabled client profiling and used this script
> ? ? ? ? ? ?
https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh
> ? ? ? ? ? ?
<https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh>
> ? ? ? ? ? ? to extract data. I am attaching output files. I don't have
> ? ? ? ? ? ? any reference data to compare with my output. Hopefully you
> ? ? ? ? ? ? can make some sense out of it.
>
> ? ? ? ? ? ? On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
> ? ? ? ? ? ? <vbellur at redhat.com <mailto:vbellur at
redhat.com>> wrote:
>
> ? ? ? ? ? ? ? ? Would it be possible for you to turn on client profiling
> ? ? ? ? ? ? ? ? and then run du? Instructions for turning on client
> ? ? ? ? ? ? ? ? profiling can be found at [1]. Providing the client
> ? ? ? ? ? ? ? ? profile information can help us figure out where the
> ? ? ? ? ? ? ? ? latency could be stemming from.
>
> ? ? ? ? ? ? ? ? Regards,
> ? ? ? ? ? ? ? ? Vijay
>
> ? ? ? ? ? ? ? ? [1]
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling
> ? ? ? ? ? ? ? ?
<https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling>
>
> ? ? ? ? ? ? ? ? On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
> ? ? ? ? ? ? ? ? <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>>
> ? ? ? ? ? ? ? ? wrote:
>
> ? ? ? ? ? ? ? ? ? ? Hi Vijay
>
> ? ? ? ? ? ? ? ? ? ? Thanks for your quick response. I am using gluster
> ? ? ? ? ? ? ? ? ? ? 3.8.11 on ?Centos 7 servers
> ? ? ? ? ? ? ? ? ? ? glusterfs-3.8.11-1.el7.x86_64
>
> ? ? ? ? ? ? ? ? ? ? clients are centos 6 but I tested with a centos 7
> ? ? ? ? ? ? ? ? ? ? client as well and results didn't change
>
> ? ? ? ? ? ? ? ? ? ? gluster volume info Volume Name: atlasglust
> ? ? ? ? ? ? ? ? ? ? Type: Distribute
> ? ? ? ? ? ? ? ? ? ? Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
> ? ? ? ? ? ? ? ? ? ? Status: Started
> ? ? ? ? ? ? ? ? ? ? Snapshot Count: 0
> ? ? ? ? ? ? ? ? ? ? Number of Bricks: 5
> ? ? ? ? ? ? ? ? ? ? Transport-type: tcp
> ? ? ? ? ? ? ? ? ? ? Bricks:
> ? ? ? ? ? ? ? ? ? ? Brick1: pplxgluster01.x.y.z:/glusteratlas/brick001/gv0
> ? ? ? ? ? ? ? ? ? ? Brick2: pplxgluster02..x.y.z:/glusteratlas/brick002/gv0
> ? ? ? ? ? ? ? ? ? ? Brick3: pplxgluster03.x.y.z:/glusteratlas/brick003/gv0
> ? ? ? ? ? ? ? ? ? ? Brick4: pplxgluster04.x.y.z:/glusteratlas/brick004/gv0
> ? ? ? ? ? ? ? ? ? ? Brick5: pplxgluster05.x.y.z:/glusteratlas/brick005/gv0
> ? ? ? ? ? ? ? ? ? ? Options Reconfigured:
> ? ? ? ? ? ? ? ? ? ? nfs.disable: on
> ? ? ? ? ? ? ? ? ? ? performance.readdir-ahead: on
> ? ? ? ? ? ? ? ? ? ? transport.address-family: inet
> ? ? ? ? ? ? ? ? ? ? auth.allow: x.y.z
>
> ? ? ? ? ? ? ? ? ? ? I am not using directory quota.
>
> ? ? ? ? ? ? ? ? ? ? Please let me know if you require some more info
>
> ? ? ? ? ? ? ? ? ? ? Thanks
>
> ? ? ? ? ? ? ? ? ? ? Kashif
>
>
>
> ? ? ? ? ? ? ? ? ? ? On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
> ? ? ? ? ? ? ? ? ? ? <vbellur at redhat.com <mailto:vbellur at
redhat.com>> wrote:
>
> ? ? ? ? ? ? ? ? ? ? ? ? Can you please provide more details about your
> ? ? ? ? ? ? ? ? ? ? ? ? volume configuration and the version of gluster
> ? ? ? ? ? ? ? ? ? ? ? ? that you are using?
>
> ? ? ? ? ? ? ? ? ? ? ? ? Regards,
> ? ? ? ? ? ? ? ? ? ? ? ? Vijay
>
> ? ? ? ? ? ? ? ? ? ? ? ? On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
> ? ? ? ? ? ? ? ? ? ? ? ? <kashif.alig at gmail.com
> ? ? ? ? ? ? ? ? ? ? ? ? <mailto:kashif.alig at gmail.com>> wrote:
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Hi
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? I have just moved our 400 TB HPC storage
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? from lustre to gluster. It is part of a
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? research institute and users have very small
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? files to ?big files ( few KB to 20GB) . Our
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? setup consists of 5 servers, each with 96TB
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? RAID 6 disks. All servers are connected
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? through 10G ethernet but not all clients.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Gluster volumes are distributed without any
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? replication. There are approximately 80
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? million files in file system.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? I am mounting using glusterfs on ?clients.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? I have copied everything from lustre to
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? gluster but old file system exist so I can
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? compare.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? The problem, I am facing is extremely slow
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? du on even a small directory. Also the time
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? taken is substantially different each time.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? I tried du from same client on ?a particular
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? directory twice and got these results.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? time du -sh /data/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3.7G /data/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? real 7m29.243s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? user 0m1.448s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? sys 0m7.067s
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? time du -sh /data/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3.7G ? ? ?/data/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? real 16m43.735s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? user 0m1.097s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? sys 0m5.802s
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? 16m and 7m is too long for a 3.7 G
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? directory. I must mention that the directory
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? contains huge number of files (208736)
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? but running du on same directory on old data
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? gives this result
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? time du -sh /olddata/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4.0G /olddata/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? real 3m1.255s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? user 0m0.755s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? sys 0m38.099s
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? much better if I run same command again
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? time du -sh /olddata/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4.0G /olddata/aa/bb/cc
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? real 0m8.309s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? user 0m0.313s
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? sys 0m7.755s
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Is there anything I can do to improve this
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? performance? I would also like hear from
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? some one who is running same kind of setup.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Thanks
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Kashif
>
>
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? _______________________________________________
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Gluster-users mailing list
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? Gluster-users at gluster.org
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? <mailto:Gluster-users at gluster.org>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?
http://lists.gluster.org/mailman/listinfo/gluster-users
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
>
>
>
>
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170711/839b2c9d/attachment.html>

mohammad kashif

2017-Jul-12 09:09 UTC

head link

[Gluster-users] Extremely slow du

Hi Vijay

Thanks, It would be great if someone can go through the configuration
options. Is there any reference document where all these options are
described in detail?

I was mainly worried about very slow lookup so only did du on a certain
file which has a lot of small files (200K). The lookup time improved
dramatically. I didn't do any proper benchmarking.

Gluster 3.8 without any optimization

time du -ksh binno/

3.7G    binno/

real    117m45.733s

user    0m1.635s
sys     0m6.430s

Gluster 3.11 with optimization

time du -ksh binno/
3.7G    binno/

real    2m5.595s
user    0m0.767s
sys     0m4.437s


I have also enabled profile

Before update

Fop           Call Count    Avg-Latency    Min-Latency    Max-Latency

---           ----------    -----------    -----------    -----------

STAT                 153       90.72 us        5.00 us      666.00 us

STATFS                 3      677.67 us      620.00 us      709.00 us

OPENDIR              149     1213.81 us      519.00 us    28777.00 us

LOOKUP               552     8493.01 us        3.00 us    79689.00 us

READDIRP            3518     5351.76 us       11.00 us   341877.00 us

FORGET          10050351           0 us           0 us           0 us

RELEASE          9062130           0 us           0 us           0 us

RELEASEDIR          5395           0 us           0 us           0 us

------ ----- ----- ----- ----- ----- ----- -----  ----- ----- ----- -----


After update


Interval 8 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
Fop
 ---------   -----------   -----------   -----------   ------------
----
      0.00       0.00 us       0.00 us       0.00 us              2
RELEASEDIR
      0.08     118.00 us     113.00 us     123.00 us              2
STATFS
      0.13     190.00 us     189.00 us     191.00 us              2
LOOKUP
      0.29     422.00 us     422.00 us     422.00 us              2
OPENDIR
     99.49   28539.60 us    1698.00 us   48655.00 us             10
READDIRP
      0.00       0.00 us       0.00 us       0.00 us           5217
UPCALL
      0.00       0.00 us       0.00 us       0.00 us           5217
CI_FORGET

    Duration: 22 seconds
   Data Read: 0 bytes
Data Written: 0 bytes


I am not sure about profiling result as I don't understand it correctly.


Thanks


Kashif






On Tue, Jul 11, 2017 at 4:22 PM, Vijay Bellur <vbellur at redhat.com>
wrote:
> Hi Kashif,
>
> Thank you for your feedback! Do you have some data on the nature of
> performance improvement observed with 3.11 in the new setup?
>
> Adding Raghavendra and Poornima for validation of configuration and help
> with identifying why certain files disappeared from the mount point after
> enabling readdir-optimize.
>
> Regards,
> Vijay
>
>
> On 07/11/2017 11:06 AM, mohammad kashif wrote:
>
>> Hi Vijay and Experts
>>
>> I didn't want to experiment with my production setup so started  a
>> parallel system with two server and around 80TB storage.  First
>> configured with gluster 3.8 and had the same lookup performance issue.
>> Then upgraded to 3.11 as you suggested and it made huge improvement in
>> lookup time. I also did some more optimization as suggested in other
>> threads.
>> Now I am going to update my production server. I am planning to use
>> following  optimization option, it would be very useful if you can
point
>> out any inconsistency or suggest some other options. My production
setup
>> has 5 servers consisting of  400TB storage and around 80 million files
>> of varying lengths.
>>
>> Options Reconfigured:
>> server.event-threads: 4
>> client.event-threads: 4
>> cluster.lookup-optimize: on
>> cluster.readdir-optimize: off
>> performance.client-io-threads: on
>> performance.cache-size: 1GB
>> performance.parallel-readdir: on
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> auth.allow: 163.1.136.*
>> diagnostics.latency-measurement: on
>> diagnostics.count-fop-hits: on
>>
>> I found that setting cluster.readdir-optimize to 'on' made some
files
>> disappear from client !
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>> On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur at redhat.com
>> <mailto:vbellur at redhat.com>> wrote:
>>
>>     Hi Mohammad,
>>
>>     A lot of time is being spent in addressing metadata calls as
>>     expected. Can you consider testing out with 3.11 with md-cache [1]
>>     and readdirp [2] improvements?
>>
>>     Adding Poornima and Raghavendra who worked on these enhancements to
>>     help out further.
>>
>>     Thanks,
>>     Vijay
>>
>>     [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
>>    
<https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>
>>
>>     [2] https://github.com/gluster/glusterfs/issues/166
>>     <https://github.com/gluster/glusterfs/issues/166>
>>
>>     On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
>>     <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>> wrote:
>>
>>         Hi Vijay
>>
>>         Did you manage to look into the gluster profile logs ?
>>
>>         Thanks
>>
>>         Kashif
>>
>>         On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
>>         <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>> wrote:
>>
>>             Hi Vijay
>>
>>             I have enabled client profiling and used this script
>>            
https://github.com/bengland2/gluster-profile-analysis/blob/m
>> aster/gvp-client.sh
>>            
<https://github.com/bengland2/gluster-profile-analysis/blob/
>> master/gvp-client.sh>
>>             to extract data. I am attaching output files. I don't
have
>>             any reference data to compare with my output. Hopefully you
>>             can make some sense out of it.
>>
>>             On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
>>             <vbellur at redhat.com <mailto:vbellur at
redhat.com>> wrote:
>>
>>                 Would it be possible for you to turn on client
profiling
>>                 and then run du? Instructions for turning on client
>>                 profiling can be found at [1]. Providing the client
>>                 profile information can help us figure out where the
>>                 latency could be stemming from.
>>
>>                 Regards,
>>                 Vijay
>>
>>                 [1] https://gluster.readthedocs.io
>> /en/latest/Administrator%20Guide/Performance%20Testing/#
>> client-side-profiling
>>                 <https://gluster.readthedocs.i
>> o/en/latest/Administrator%20Guide/Performance%20Testing/#
>> client-side-profiling>
>>
>>                 On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
>>                 <kashif.alig at gmail.com <mailto:kashif.alig at
gmail.com>>
>>
>>                 wrote:
>>
>>                     Hi Vijay
>>
>>                     Thanks for your quick response. I am using gluster
>>                     3.8.11 on  Centos 7 servers
>>                     glusterfs-3.8.11-1.el7.x86_64
>>
>>                     clients are centos 6 but I tested with a centos 7
>>                     client as well and results didn't change
>>
>>                     gluster volume info Volume Name: atlasglust
>>                     Type: Distribute
>>                     Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>>                     Status: Started
>>                     Snapshot Count: 0
>>                     Number of Bricks: 5
>>                     Transport-type: tcp
>>                     Bricks:
>>                     Brick1: pplxgluster01.x.y.z:/glusterat
>> las/brick001/gv0
>>                     Brick2: pplxgluster02..x.y.z:/glustera
>> tlas/brick002/gv0
>>                     Brick3: pplxgluster03.x.y.z:/glusterat
>> las/brick003/gv0
>>                     Brick4: pplxgluster04.x.y.z:/glusterat
>> las/brick004/gv0
>>                     Brick5: pplxgluster05.x.y.z:/glusterat
>> las/brick005/gv0
>>                     Options Reconfigured:
>>                     nfs.disable: on
>>                     performance.readdir-ahead: on
>>                     transport.address-family: inet
>>                     auth.allow: x.y.z
>>
>>                     I am not using directory quota.
>>
>>                     Please let me know if you require some more info
>>
>>                     Thanks
>>
>>                     Kashif
>>
>>
>>
>>                     On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
>>                     <vbellur at redhat.com <mailto:vbellur at
redhat.com>>
>> wrote:
>>
>>                         Can you please provide more details about your
>>                         volume configuration and the version of gluster
>>                         that you are using?
>>
>>                         Regards,
>>                         Vijay
>>
>>                         On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
>>                         <kashif.alig at gmail.com
>>                         <mailto:kashif.alig at gmail.com>>
wrote:
>>
>>                             Hi
>>
>>                             I have just moved our 400 TB HPC storage
>>                             from lustre to gluster. It is part of a
>>                             research institute and users have very
small
>>                             files to  big files ( few KB to 20GB) . Our
>>                             setup consists of 5 servers, each with 96TB
>>                             RAID 6 disks. All servers are connected
>>                             through 10G ethernet but not all clients.
>>                             Gluster volumes are distributed without any
>>                             replication. There are approximately 80
>>                             million files in file system.
>>                             I am mounting using glusterfs on  clients.
>>
>>                             I have copied everything from lustre to
>>                             gluster but old file system exist so I can
>>                             compare.
>>
>>                             The problem, I am facing is extremely slow
>>                             du on even a small directory. Also the time
>>                             taken is substantially different each time.
>>                             I tried du from same client on  a
particular
>>                             directory twice and got these results.
>>
>>                             time du -sh /data/aa/bb/cc
>>                             3.7G /data/aa/bb/cc
>>                             real 7m29.243s
>>                             user 0m1.448s
>>                             sys 0m7.067s
>>
>>                             time du -sh /data/aa/bb/cc
>>                             3.7G      /data/aa/bb/cc
>>                             real 16m43.735s
>>                             user 0m1.097s
>>                             sys 0m5.802s
>>
>>                             16m and 7m is too long for a 3.7 G
>>                             directory. I must mention that the
directory
>>                             contains huge number of files (208736)
>>
>>                             but running du on same directory on old
data
>>                             gives this result
>>
>>                             time du -sh /olddata/aa/bb/cc
>>                             4.0G /olddata/aa/bb/cc
>>                             real 3m1.255s
>>                             user 0m0.755s
>>                             sys 0m38.099s
>>
>>                             much better if I run same command again
>>
>>                             time du -sh /olddata/aa/bb/cc
>>                             4.0G /olddata/aa/bb/cc
>>                             real 0m8.309s
>>                             user 0m0.313s
>>                             sys 0m7.755s
>>
>>                             Is there anything I can do to improve this
>>                             performance? I would also like hear from
>>                             some one who is running same kind of setup.
>>
>>                             Thanks
>>
>>                             Kashif
>>
>>
>>
>>                             ______________________________
>> _________________
>>                             Gluster-users mailing list
>>                             Gluster-users at gluster.org
>>                             <mailto:Gluster-users at gluster.org>
>>                             http://lists.gluster.org/mailm
>> an/listinfo/gluster-users
>>                             <http://lists.gluster.org/mail
>> man/listinfo/gluster-users>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170712/0c654b09/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

Gluster users - Jul 2017 - Extremely slow du

[Gluster-users] Extremely slow du

[Gluster-users] Extremely slow du

[Gluster-users] Extremely slow du

Seemingly Similar Threads