Strahil Nikolov
2020-Feb-10 15:21 UTC
[Gluster-users] It appears that readdir is not cached for FUSE mounts
On February 10, 2020 2:25:17 PM GMT+02:00, Matthias Schniedermeyer <matthias-gluster-users at maxcluster.de> wrote:>Hi > > >I would describe our basic use case for gluster as: >"data-store for a cold-standby application". > >A specific application is installed on 2 hardware machines, the data is >kept in-sync between the 2 machines by a replica-2 gluster volume. >(IOW: "RAID 1") > >At any one time only 1 machine has the volume mounted and the >application running. If the machine goes down the application is >started >on the remaining machine. >IOW at any one point in time there is only ever 1 "reader & writer" >running. > >I profiled a performance problem we have with this application, which >unfortunately we can't modify. > >The profile shows many "opendir/readdirp/releasedir" cycles, the >directory in question has about 1000 files and the application "stalls" >for several milliseconds any time it decides to do a readdir. >The volume is mounted via FUSE and it appears that said operation is >not >cached at all. > >To provide a test-case i tried to replicate what the application does. >The problematic operation is nearly perfectly emulated just by using >"ls .". > >I created a script that replicates how we use gluster and demonstrates >that a FUSE-mount appears to be lacking any caching of readdir. > >A word about the test-environment: >2 identical servers >Dual Socket Xeon CPU E5-2640 v3 (8 cores, 2.60GHz, HT enabled) >RAM: 128GB DDR4 ECC (8x16GB) >Storage: 2TB Intel P3520 PCIe-NVMe-SSD >Network: Gluster: 10GB/s direct connect (no switch), external: 1Gbit/s >OS: CentOS 7.7, Installed with "Minimal" ISO, everything: Default >Up2Date as of: 2020-01-21 (Kernel: 3.10.0-1062.9.1.el7.x86_64) >SELinux: Disabled >SSH-Key for 1 -> 2 exchanged >Gluster 6.7 packages installed via 'centos-release-gluster6' > >see attached: gluster-testcase-no-caching-of-dir-operations-for-fuse.sh > >The meat of the testcase is this: >a profile of: >ls . >vs: >ls . . . . . . . . . . >(10 dots) > > > cat /root/profile-1-times | grep DIR | head -n 3 >0.00 0.00 us 0.00 us 0.00 us 1 >RELEASEDIR > 0.27 66.79 us 66.79 us 66.79 us 1 OPENDIR >98.65 12190.30 us 9390.88 us 14989.73 us 2 >READDIRP > > > cat /root/profile-10-times | grep DIR | head -n 3 >0.00 0.00 us 0.00 us 0.00 us 10 >RELEASEDIR > 0.64 108.02 us 85.72 us 131.96 us 10 OPENDIR >99.36 8388.64 us 5174.71 us 14808.77 us 20 >READDIRP > >This testcase shows perfect scaling. >10 times the request, results in 10 times the gluster-operations. > >I would say ideally there should be no difference in the number of >gluster-operations, regardless of how often a directory is read in a >short amount of time (with no changes in between) > > >Is there something we can do to enable caching or otherwise improve >performance?Hi Matthias, Have you tried the 'readdir-ahead' option . According to docs it is useful for ' improving sequential directory read performance' . I'm not sure how gluster defines sequential directory read, but it's worth trying. Also, you can try metadata caching , as described in: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-directory_operations The actual group should contain the following: https://github.com/gluster/glusterfs/blob/master/extras/group-metadata-cache Best Regards, Strahil Nikolov
Matthias Schniedermeyer
2020-Feb-10 15:32 UTC
[Gluster-users] It appears that readdir is not cached for FUSE mounts
On 10.02.20 16:21, Strahil Nikolov wrote:> On February 10, 2020 2:25:17 PM GMT+02:00, Matthias Schniedermeyer <matthias-gluster-users at maxcluster.de> wrote: >> Hi >> >> >> I would describe our basic use case for gluster as: >> "data-store for a cold-standby application". >> >> A specific application is installed on 2 hardware machines, the data is >> kept in-sync between the 2 machines by a replica-2 gluster volume. >> (IOW: "RAID 1") >> >> At any one time only 1 machine has the volume mounted and the >> application running. If the machine goes down the application is >> started >> on the remaining machine. >> IOW at any one point in time there is only ever 1 "reader & writer" >> running. >> >> I profiled a performance problem we have with this application, which >> unfortunately we can't modify. >> >> The profile shows many "opendir/readdirp/releasedir" cycles, the >> directory in question has about 1000 files and the application "stalls" >> for several milliseconds any time it decides to do a readdir. >> The volume is mounted via FUSE and it appears that said operation is >> not >> cached at all. >> >> To provide a test-case i tried to replicate what the application does. >> The problematic operation is nearly perfectly emulated just by using >> "ls .". >> >> I created a script that replicates how we use gluster and demonstrates >> that a FUSE-mount appears to be lacking any caching of readdir. >> >> A word about the test-environment: >> 2 identical servers >> Dual Socket Xeon CPU E5-2640 v3 (8 cores, 2.60GHz, HT enabled) >> RAM: 128GB DDR4 ECC (8x16GB) >> Storage: 2TB Intel P3520 PCIe-NVMe-SSD >> Network: Gluster: 10GB/s direct connect (no switch), external: 1Gbit/s >> OS: CentOS 7.7, Installed with "Minimal" ISO, everything: Default >> Up2Date as of: 2020-01-21 (Kernel: 3.10.0-1062.9.1.el7.x86_64) >> SELinux: Disabled >> SSH-Key for 1 -> 2 exchanged >> Gluster 6.7 packages installed via 'centos-release-gluster6' >> >> see attached: gluster-testcase-no-caching-of-dir-operations-for-fuse.sh >> >> The meat of the testcase is this: >> a profile of: >> ls . >> vs: >> ls . . . . . . . . . . >> (10 dots) >> >>> cat /root/profile-1-times | grep DIR | head -n 3 >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASEDIR >> 0.27 66.79 us 66.79 us 66.79 us 1 OPENDIR >> 98.65 12190.30 us 9390.88 us 14989.73 us 2 >> READDIRP >> >>> cat /root/profile-10-times | grep DIR | head -n 3 >> 0.00 0.00 us 0.00 us 0.00 us 10 >> RELEASEDIR >> 0.64 108.02 us 85.72 us 131.96 us 10 OPENDIR >> 99.36 8388.64 us 5174.71 us 14808.77 us 20 >> READDIRP >> >> This testcase shows perfect scaling. >> 10 times the request, results in 10 times the gluster-operations. >> >> I would say ideally there should be no difference in the number of >> gluster-operations, regardless of how often a directory is read in a >> short amount of time (with no changes in between) >> >> >> Is there something we can do to enable caching or otherwise improve >> performance? > > Hi Matthias, > > Have you tried the 'readdir-ahead' option .> According to docs it is useful for ' improving sequential directory read performance' . > I'm not sure how gluster defines sequential directory read, but it's worth trying. readdir-ahead is enabled by default. Has been for several years. In effect this option changes how many READDIRP OPs are executed for a single "ls .". (It takes more OPs when readdir-ahead is disabled.)> Also, you can try metadata caching , as described in: > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-directory_operations > The actual group should contain the following: > https://github.com/gluster/glusterfs/blob/master/extras/group-metadata-cacheMetadata-Caching, in general, works: e.g. `stat FILE` is cached if executed repeatedly. AFAICT the big exception to metadata-caching is readdir. -- Matthias