thr3ads.net - Gluster users - [Gluster-users] It appears that readdir is not cached for FUSE mounts [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Matthias Schniedermeyer

2020-Feb-10 15:32 UTC

[Gluster-users] It appears that readdir is not cached for FUSE mounts

On 10.02.20 16:21, Strahil Nikolov wrote:> On February 10, 2020 2:25:17 PM GMT+02:00, Matthias Schniedermeyer
<matthias-gluster-users at maxcluster.de> wrote:
>> Hi
>>
>>
>> I would describe our basic use case for gluster as:
>> "data-store for a cold-standby application".
>>
>> A specific application is installed on 2 hardware machines, the data is
>> kept in-sync between the 2 machines by a replica-2 gluster volume.
>> (IOW: "RAID 1")
>>
>> At any one time only 1 machine has the volume mounted and the
>> application running. If the machine goes down the application is
>> started
>> on the remaining machine.
>> IOW at any one point in time there is only ever 1 "reader &
writer"
>> running.
>>
>> I profiled a performance problem we have with this application, which
>> unfortunately we can't modify.
>>
>> The profile shows many "opendir/readdirp/releasedir" cycles,
the
>> directory in question has about 1000 files and the application
"stalls"
>> for several milliseconds any time it decides to do a readdir.
>> The volume is mounted via FUSE and it appears that said operation is
>> not
>> cached at all.
>>
>> To provide a test-case i tried to replicate what the application does.
>> The problematic operation is nearly perfectly emulated just by using
>> "ls .".
>>
>> I created a script that replicates how we use gluster and demonstrates
>> that a FUSE-mount appears to be lacking any caching of readdir.
>>
>> A word about the test-environment:
>> 2 identical servers
>> Dual Socket Xeon CPU E5-2640 v3 (8 cores, 2.60GHz, HT enabled)
>> RAM: 128GB DDR4 ECC (8x16GB)
>> Storage: 2TB Intel P3520 PCIe-NVMe-SSD
>> Network: Gluster: 10GB/s direct connect (no switch), external: 1Gbit/s
>> OS: CentOS 7.7, Installed with "Minimal" ISO, everything:
Default
>> Up2Date as of: 2020-01-21 (Kernel: 3.10.0-1062.9.1.el7.x86_64)
>> SELinux: Disabled
>> SSH-Key for 1 -> 2 exchanged
>> Gluster 6.7 packages installed via 'centos-release-gluster6'
>>
>> see attached: gluster-testcase-no-caching-of-dir-operations-for-fuse.sh
>>
>> The meat of the testcase is this:
>> a profile of:
>> ls .
>> vs:
>> ls . . . . . . . . . .
>> (10 dots)
>>
>>> cat /root/profile-1-times | grep DIR | head -n 3
>> 0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASEDIR
>> 0.27      66.79 us      66.79 us      66.79 us              1  OPENDIR
>> 98.65   12190.30 us    9390.88 us   14989.73 us              2
>> READDIRP
>>
>>> cat /root/profile-10-times | grep DIR | head -n 3
>> 0.00       0.00 us       0.00 us       0.00 us             10
>> RELEASEDIR
>> 0.64     108.02 us      85.72 us     131.96 us             10  OPENDIR
>> 99.36    8388.64 us    5174.71 us   14808.77 us             20
>> READDIRP
>>
>> This testcase shows perfect scaling.
>> 10 times the request, results in 10 times the gluster-operations.
>>
>> I would say ideally there should be no difference in the number of
>> gluster-operations, regardless of how often a directory is read in a
>> short amount of time (with no changes in between)
>>
>>
>> Is there something we can do to enable caching or otherwise improve
>> performance?
> 
> Hi Matthias,
> 
> Have you tried the 'readdir-ahead' option . > According to docs it is useful for ' improving sequential directory
read performance' .
 > I'm not sure how gluster defines sequential directory read, but
it's worth trying.

readdir-ahead is enabled by default. Has been for several years.
In effect this option changes how many READDIRP OPs are executed for a
single "ls .".
(It takes more OPs when readdir-ahead is disabled.)
> Also, you can try metadata caching , as described in:
>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-directory_operations
> The actual group should contain the following:
>
https://github.com/gluster/glusterfs/blob/master/extras/group-metadata-cache
Metadata-Caching, in general, works:
e.g. `stat FILE` is cached if executed repeatedly.

AFAICT the big exception to metadata-caching is readdir.



-- 
Matthias

Strahil Nikolov

2020-Feb-10 16:31 UTC

head link

[Gluster-users] It appears that readdir is not cached for FUSE mounts

On February 10, 2020 5:32:29 PM GMT+02:00, Matthias Schniedermeyer
<matthias-gluster-users at maxcluster.de> wrote:>On 10.02.20 16:21, Strahil Nikolov wrote:
>> On February 10, 2020 2:25:17 PM GMT+02:00, Matthias Schniedermeyer
><matthias-gluster-users at maxcluster.de> wrote:
>>> Hi
>>>
>>>
>>> I would describe our basic use case for gluster as:
>>> "data-store for a cold-standby application".
>>>
>>> A specific application is installed on 2 hardware machines, the
data
>is
>>> kept in-sync between the 2 machines by a replica-2 gluster volume.
>>> (IOW: "RAID 1")
>>>
>>> At any one time only 1 machine has the volume mounted and the
>>> application running. If the machine goes down the application is
>>> started
>>> on the remaining machine.
>>> IOW at any one point in time there is only ever 1 "reader
& writer"
>>> running.
>>>
>>> I profiled a performance problem we have with this application,
>which
>>> unfortunately we can't modify.
>>>
>>> The profile shows many "opendir/readdirp/releasedir"
cycles, the
>>> directory in question has about 1000 files and the application
>"stalls"
>>> for several milliseconds any time it decides to do a readdir.
>>> The volume is mounted via FUSE and it appears that said operation
is
>>> not
>>> cached at all.
>>>
>>> To provide a test-case i tried to replicate what the application
>does.
>>> The problematic operation is nearly perfectly emulated just by
using
>>> "ls .".
>>>
>>> I created a script that replicates how we use gluster and
>demonstrates
>>> that a FUSE-mount appears to be lacking any caching of readdir.
>>>
>>> A word about the test-environment:
>>> 2 identical servers
>>> Dual Socket Xeon CPU E5-2640 v3 (8 cores, 2.60GHz, HT enabled)
>>> RAM: 128GB DDR4 ECC (8x16GB)
>>> Storage: 2TB Intel P3520 PCIe-NVMe-SSD
>>> Network: Gluster: 10GB/s direct connect (no switch), external:
>1Gbit/s
>>> OS: CentOS 7.7, Installed with "Minimal" ISO, everything:
Default
>>> Up2Date as of: 2020-01-21 (Kernel: 3.10.0-1062.9.1.el7.x86_64)
>>> SELinux: Disabled
>>> SSH-Key for 1 -> 2 exchanged
>>> Gluster 6.7 packages installed via
'centos-release-gluster6'
>>>
>>> see attached:
>gluster-testcase-no-caching-of-dir-operations-for-fuse.sh
>>>
>>> The meat of the testcase is this:
>>> a profile of:
>>> ls .
>>> vs:
>>> ls . . . . . . . . . .
>>> (10 dots)
>>>
>>>> cat /root/profile-1-times | grep DIR | head -n 3
>>> 0.00       0.00 us       0.00 us       0.00 us              1
>>> RELEASEDIR
>>> 0.27      66.79 us      66.79 us      66.79 us              1 
>OPENDIR
>>> 98.65   12190.30 us    9390.88 us   14989.73 us              2
>>> READDIRP
>>>
>>>> cat /root/profile-10-times | grep DIR | head -n 3
>>> 0.00       0.00 us       0.00 us       0.00 us             10
>>> RELEASEDIR
>>> 0.64     108.02 us      85.72 us     131.96 us             10 
>OPENDIR
>>> 99.36    8388.64 us    5174.71 us   14808.77 us             20
>>> READDIRP
>>>
>>> This testcase shows perfect scaling.
>>> 10 times the request, results in 10 times the gluster-operations.
>>>
>>> I would say ideally there should be no difference in the number of
>>> gluster-operations, regardless of how often a directory is read in
a
>>> short amount of time (with no changes in between)
>>>
>>>
>>> Is there something we can do to enable caching or otherwise improve
>>> performance?
>> 
>> Hi Matthias,
>> 
>> Have you tried the 'readdir-ahead' option .
>> According to docs it is useful for ' improving sequential directory
>read performance' .
>> I'm not sure how gluster defines sequential directory read, but
it's
>worth trying.
>
>readdir-ahead is enabled by default. Has been for several years.
>In effect this option changes how many READDIRP OPs are executed for a
>single "ls .".
>(It takes more OPs when readdir-ahead is disabled.)
>
>> Also, you can try metadata caching , as described in:
>>
>https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-directory_operations
>> The actual group should contain the following:
>>
>https://github.com/gluster/glusterfs/blob/master/extras/group-metadata-cache
>
>Metadata-Caching, in general, works:
>e.g. `stat FILE` is cached if executed repeatedly.
>
>AFAICT the big exception to metadata-caching is readdir.
Hi Matthias,

This now has turned into 'shoot into the dark'.

I have checked a nice presentation and these 2 attracted my attention:
performance.parallel-readdir on
cluster.readdir-hashed on

Presentation is found at:
https://www.google.com/url?sa=t&source=web&rct=j&url=https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf&ved=2ahUKEwirh-qBs8fnAhWTTxUIHfn3CWEQFjAAegQIAhAB&usg=AOvVaw1yhHZaWovhYGCexkGaMVQ8&cshid=1581352097024


I hope you find something useful there.

Best Regards,
Strahil Nikolov

Gluster users - Feb 2020 - It appears that readdir is not cached for FUSE mounts

[Gluster-users] It appears that readdir is not cached for FUSE mounts

[Gluster-users] It appears that readdir is not cached for FUSE mounts