Matthias Schniedermeyer
2020-Feb-10 15:32 UTC
[Gluster-users] It appears that readdir is not cached for FUSE mounts
On 10.02.20 16:21, Strahil Nikolov wrote:> On February 10, 2020 2:25:17 PM GMT+02:00, Matthias Schniedermeyer <matthias-gluster-users at maxcluster.de> wrote: >> Hi >> >> >> I would describe our basic use case for gluster as: >> "data-store for a cold-standby application". >> >> A specific application is installed on 2 hardware machines, the data is >> kept in-sync between the 2 machines by a replica-2 gluster volume. >> (IOW: "RAID 1") >> >> At any one time only 1 machine has the volume mounted and the >> application running. If the machine goes down the application is >> started >> on the remaining machine. >> IOW at any one point in time there is only ever 1 "reader & writer" >> running. >> >> I profiled a performance problem we have with this application, which >> unfortunately we can't modify. >> >> The profile shows many "opendir/readdirp/releasedir" cycles, the >> directory in question has about 1000 files and the application "stalls" >> for several milliseconds any time it decides to do a readdir. >> The volume is mounted via FUSE and it appears that said operation is >> not >> cached at all. >> >> To provide a test-case i tried to replicate what the application does. >> The problematic operation is nearly perfectly emulated just by using >> "ls .". >> >> I created a script that replicates how we use gluster and demonstrates >> that a FUSE-mount appears to be lacking any caching of readdir. >> >> A word about the test-environment: >> 2 identical servers >> Dual Socket Xeon CPU E5-2640 v3 (8 cores, 2.60GHz, HT enabled) >> RAM: 128GB DDR4 ECC (8x16GB) >> Storage: 2TB Intel P3520 PCIe-NVMe-SSD >> Network: Gluster: 10GB/s direct connect (no switch), external: 1Gbit/s >> OS: CentOS 7.7, Installed with "Minimal" ISO, everything: Default >> Up2Date as of: 2020-01-21 (Kernel: 3.10.0-1062.9.1.el7.x86_64) >> SELinux: Disabled >> SSH-Key for 1 -> 2 exchanged >> Gluster 6.7 packages installed via 'centos-release-gluster6' >> >> see attached: gluster-testcase-no-caching-of-dir-operations-for-fuse.sh >> >> The meat of the testcase is this: >> a profile of: >> ls . >> vs: >> ls . . . . . . . . . . >> (10 dots) >> >>> cat /root/profile-1-times | grep DIR | head -n 3 >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASEDIR >> 0.27 66.79 us 66.79 us 66.79 us 1 OPENDIR >> 98.65 12190.30 us 9390.88 us 14989.73 us 2 >> READDIRP >> >>> cat /root/profile-10-times | grep DIR | head -n 3 >> 0.00 0.00 us 0.00 us 0.00 us 10 >> RELEASEDIR >> 0.64 108.02 us 85.72 us 131.96 us 10 OPENDIR >> 99.36 8388.64 us 5174.71 us 14808.77 us 20 >> READDIRP >> >> This testcase shows perfect scaling. >> 10 times the request, results in 10 times the gluster-operations. >> >> I would say ideally there should be no difference in the number of >> gluster-operations, regardless of how often a directory is read in a >> short amount of time (with no changes in between) >> >> >> Is there something we can do to enable caching or otherwise improve >> performance? > > Hi Matthias, > > Have you tried the 'readdir-ahead' option .> According to docs it is useful for ' improving sequential directory read performance' . > I'm not sure how gluster defines sequential directory read, but it's worth trying. readdir-ahead is enabled by default. Has been for several years. In effect this option changes how many READDIRP OPs are executed for a single "ls .". (It takes more OPs when readdir-ahead is disabled.)> Also, you can try metadata caching , as described in: > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-directory_operations > The actual group should contain the following: > https://github.com/gluster/glusterfs/blob/master/extras/group-metadata-cacheMetadata-Caching, in general, works: e.g. `stat FILE` is cached if executed repeatedly. AFAICT the big exception to metadata-caching is readdir. -- Matthias
Strahil Nikolov
2020-Feb-10 16:31 UTC
[Gluster-users] It appears that readdir is not cached for FUSE mounts
On February 10, 2020 5:32:29 PM GMT+02:00, Matthias Schniedermeyer <matthias-gluster-users at maxcluster.de> wrote:>On 10.02.20 16:21, Strahil Nikolov wrote: >> On February 10, 2020 2:25:17 PM GMT+02:00, Matthias Schniedermeyer ><matthias-gluster-users at maxcluster.de> wrote: >>> Hi >>> >>> >>> I would describe our basic use case for gluster as: >>> "data-store for a cold-standby application". >>> >>> A specific application is installed on 2 hardware machines, the data >is >>> kept in-sync between the 2 machines by a replica-2 gluster volume. >>> (IOW: "RAID 1") >>> >>> At any one time only 1 machine has the volume mounted and the >>> application running. If the machine goes down the application is >>> started >>> on the remaining machine. >>> IOW at any one point in time there is only ever 1 "reader & writer" >>> running. >>> >>> I profiled a performance problem we have with this application, >which >>> unfortunately we can't modify. >>> >>> The profile shows many "opendir/readdirp/releasedir" cycles, the >>> directory in question has about 1000 files and the application >"stalls" >>> for several milliseconds any time it decides to do a readdir. >>> The volume is mounted via FUSE and it appears that said operation is >>> not >>> cached at all. >>> >>> To provide a test-case i tried to replicate what the application >does. >>> The problematic operation is nearly perfectly emulated just by using >>> "ls .". >>> >>> I created a script that replicates how we use gluster and >demonstrates >>> that a FUSE-mount appears to be lacking any caching of readdir. >>> >>> A word about the test-environment: >>> 2 identical servers >>> Dual Socket Xeon CPU E5-2640 v3 (8 cores, 2.60GHz, HT enabled) >>> RAM: 128GB DDR4 ECC (8x16GB) >>> Storage: 2TB Intel P3520 PCIe-NVMe-SSD >>> Network: Gluster: 10GB/s direct connect (no switch), external: >1Gbit/s >>> OS: CentOS 7.7, Installed with "Minimal" ISO, everything: Default >>> Up2Date as of: 2020-01-21 (Kernel: 3.10.0-1062.9.1.el7.x86_64) >>> SELinux: Disabled >>> SSH-Key for 1 -> 2 exchanged >>> Gluster 6.7 packages installed via 'centos-release-gluster6' >>> >>> see attached: >gluster-testcase-no-caching-of-dir-operations-for-fuse.sh >>> >>> The meat of the testcase is this: >>> a profile of: >>> ls . >>> vs: >>> ls . . . . . . . . . . >>> (10 dots) >>> >>>> cat /root/profile-1-times | grep DIR | head -n 3 >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASEDIR >>> 0.27 66.79 us 66.79 us 66.79 us 1 >OPENDIR >>> 98.65 12190.30 us 9390.88 us 14989.73 us 2 >>> READDIRP >>> >>>> cat /root/profile-10-times | grep DIR | head -n 3 >>> 0.00 0.00 us 0.00 us 0.00 us 10 >>> RELEASEDIR >>> 0.64 108.02 us 85.72 us 131.96 us 10 >OPENDIR >>> 99.36 8388.64 us 5174.71 us 14808.77 us 20 >>> READDIRP >>> >>> This testcase shows perfect scaling. >>> 10 times the request, results in 10 times the gluster-operations. >>> >>> I would say ideally there should be no difference in the number of >>> gluster-operations, regardless of how often a directory is read in a >>> short amount of time (with no changes in between) >>> >>> >>> Is there something we can do to enable caching or otherwise improve >>> performance? >> >> Hi Matthias, >> >> Have you tried the 'readdir-ahead' option . >> According to docs it is useful for ' improving sequential directory >read performance' . >> I'm not sure how gluster defines sequential directory read, but it's >worth trying. > >readdir-ahead is enabled by default. Has been for several years. >In effect this option changes how many READDIRP OPs are executed for a >single "ls .". >(It takes more OPs when readdir-ahead is disabled.) > >> Also, you can try metadata caching , as described in: >> >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-directory_operations >> The actual group should contain the following: >> >https://github.com/gluster/glusterfs/blob/master/extras/group-metadata-cache > >Metadata-Caching, in general, works: >e.g. `stat FILE` is cached if executed repeatedly. > >AFAICT the big exception to metadata-caching is readdir.Hi Matthias, This now has turned into 'shoot into the dark'. I have checked a nice presentation and these 2 attracted my attention: performance.parallel-readdir on cluster.readdir-hashed on Presentation is found at: https://www.google.com/url?sa=t&source=web&rct=j&url=https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf&ved=2ahUKEwirh-qBs8fnAhWTTxUIHfn3CWEQFjAAegQIAhAB&usg=AOvVaw1yhHZaWovhYGCexkGaMVQ8&cshid=1581352097024 I hope you find something useful there. Best Regards, Strahil Nikolov