thr3ads.net - Gluster users - [Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4 [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Alan Orth

2018-Jan-25 13:20 UTC

[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4

By the way, on a slightly related note, I'm pretty sure either
parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. We
are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6.

I updated my servers and clients to 3.12.4 and enabled these two options
after reading about them in the 3.10.0 and 3.11.0 release notes. In the
days after enabling these two options all of my clients kept getting
disconnected from the volume. The error upon attempting to list a directory
or read a file was "Transport endpoint is not connected", after which
I
would force unmount the volume with `umount -fl /home` and remount it, only
to have it get disconnected again a few hours later.

Every time the volume disconnected I looked in the client mount log and
only found information such as:

[2018-01-24 05:52:27.695225] I [MSGID: 108026]
[afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:
Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168.
sources=[0]  sinks=1
[2018-01-24 05:52:27.700611] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
2-homes-replicate-1: performing metadata selfheal on
b6a53629-a831-4ee3-a35e-f47c04297aaa
[2018-01-24 05:52:27.703021] I [MSGID: 108026]
[afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:
Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa.
sources=[0]  sinks=1

I enabled debug logging for that volume's client mount with `gluster volume
set homes diagnostics.client-log-level DEBUG` and then I saw this in the
client mount log the next time it disconnected:

[2018-01-24 08:55:19.138810] D [MSGID: 0] [io-threads.c:358:iot_schedule]
0-homes-io-threads: LOOKUP scheduled as fast fop
[2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup]
0-homes-dht: Calling fresh lookup for
/vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on
homes-readdir-ahead-1
[2018-01-24 08:55:19.138928] D [MSGID: 0] [io-threads.c:358:iot_schedule]
0-homes-io-threads: FSTAT scheduled as fast fop
[2018-01-24 08:55:19.138958] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn]
0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation now
vs cached: 2, 2
[2018-01-24 08:55:19.139187] D [MSGID: 0]
[dht-common.c:2294:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for
/vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0
[2018-01-24 08:55:19.139200] D [MSGID: 0]
[dht-layout.c:873:dht_layout_preset] 0-homes-dht: file
00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1
[2018-01-24 08:55:19.139257] D [MSGID: 0] [io-threads.c:358:iot_schedule]
0-homes-io-threads: READDIRP scheduled as fast fop

On a hunch I disabled both parallel-readdir and readdir-ahead, which I had
only enabled a few days before, and now all of the clients are much more
stable, with zero disconnections in the days since I disabled those two
volume options.

Please take a look! Thanks,

On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj at redhat.com>
wrote:
> Adding Poornima to take a look at it and comment.
>
> On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth at gmail.com>
wrote:
>
>> Hello,
>>
>> I saw that parallel-readdir was an experimental feature in GlusterFS
>> version 3.10.0, became stable in version 3.11.0, and is now recommended
for
>> small file workloads in the Red Hat Gluster Storage Server
>> documentation[2]. I've successfully enabled this on one of my
volumes but I
>> notice the following in the client mount log:
>>
>> [2018-01-23 10:24:24.048055] W [MSGID: 101174]
>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1: option
>> 'parallel-readdir' is not recognized
>> [2018-01-23 10:24:24.048072] W [MSGID: 101174]
>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0: option
>> 'parallel-readdir' is not recognized
>>
>> The GlusterFS version on the client and server is 3.12.4. What is going
>> on?
>>
>> [0]
>>
https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md
>> [1]
>>
https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md
>> [2]
>>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements
>>
>> Thank you,
>>
>>
>> --
>>
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
> --
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/c9f2c8fa/attachment.html>

Vlad Kopylov

2018-Jan-26 04:10 UTC

head link

[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4

can you please test parallel-readdir or readdir-ahead gives
disconnects? so we know which to disable

parallel-readdir doing magic ran on pdf from last year
https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf

-v

On Thu, Jan 25, 2018 at 8:20 AM, Alan Orth <alan.orth at gmail.com>
wrote:> By the way, on a slightly related note, I'm pretty sure either
> parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. We
> are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6.
>
> I updated my servers and clients to 3.12.4 and enabled these two options
> after reading about them in the 3.10.0 and 3.11.0 release notes. In the
days
> after enabling these two options all of my clients kept getting
disconnected
> from the volume. The error upon attempting to list a directory or read a
> file was "Transport endpoint is not connected", after which I
would force
> unmount the volume with `umount -fl /home` and remount it, only to have it
> get disconnected again a few hours later.
>
> Every time the volume disconnected I looked in the client mount log and
only
> found information such as:
>
> [2018-01-24 05:52:27.695225] I [MSGID: 108026]
> [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:
> Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168.
> sources=[0]  sinks=1
> [2018-01-24 05:52:27.700611] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 2-homes-replicate-1: performing metadata selfheal on
> b6a53629-a831-4ee3-a35e-f47c04297aaa
> [2018-01-24 05:52:27.703021] I [MSGID: 108026]
> [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:
> Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa.
> sources=[0]  sinks=1
>
> I enabled debug logging for that volume's client mount with `gluster
volume
> set homes diagnostics.client-log-level DEBUG` and then I saw this in the
> client mount log the next time it disconnected:
>
> [2018-01-24 08:55:19.138810] D [MSGID: 0] [io-threads.c:358:iot_schedule]
> 0-homes-io-threads: LOOKUP scheduled as fast fop
> [2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup]
> 0-homes-dht: Calling fresh lookup for
> /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on
> homes-readdir-ahead-1
> [2018-01-24 08:55:19.138928] D [MSGID: 0] [io-threads.c:358:iot_schedule]
> 0-homes-io-threads: FSTAT scheduled as fast fop
> [2018-01-24 08:55:19.138958] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn]
> 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation now
vs
> cached: 2, 2
> [2018-01-24 08:55:19.139187] D [MSGID: 0]
[dht-common.c:2294:dht_lookup_cbk]
> 0-homes-dht: fresh_lookup returned for
> /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0
> [2018-01-24 08:55:19.139200] D [MSGID: 0]
> [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file >
00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1
> [2018-01-24 08:55:19.139257] D [MSGID: 0] [io-threads.c:358:iot_schedule]
> 0-homes-io-threads: READDIRP scheduled as fast fop
>
> On a hunch I disabled both parallel-readdir and readdir-ahead, which I had
> only enabled a few days before, and now all of the clients are much more
> stable, with zero disconnections in the days since I disabled those two
> volume options.
>
> Please take a look! Thanks,
>
> On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj at
redhat.com> wrote:
>>
>> Adding Poornima to take a look at it and comment.
>>
>> On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth at
gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I saw that parallel-readdir was an experimental feature in
GlusterFS
>>> version 3.10.0, became stable in version 3.11.0, and is now
recommended for
>>> small file workloads in the Red Hat Gluster Storage Server
documentation[2].
>>> I've successfully enabled this on one of my volumes but I
notice the
>>> following in the client mount log:
>>>
>>> [2018-01-23 10:24:24.048055] W [MSGID: 101174]
>>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1:
option
>>> 'parallel-readdir' is not recognized
>>> [2018-01-23 10:24:24.048072] W [MSGID: 101174]
>>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0:
option
>>> 'parallel-readdir' is not recognized
>>>
>>> The GlusterFS version on the client and server is 3.12.4. What is
going
>>> on?
>>>
>>> [0]
>>>
https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md
>>> [1]
>>>
https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md
>>> [2]
>>>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements
>>>
>>> Thank you,
>>>
>>>
>>> --
>>>
>>> Alan Orth
>>> alan.orth at gmail.com
>>> https://picturingjordan.com
>>> https://englishbulgaria.net
>>> https://mjanja.ch
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
> --
>
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Alan Orth

2018-Jan-26 11:59 UTC

head link

[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4

Dear Vlad,

I'm sorry, I don't want to test this again on my system just yet! It
caused
too much instability for my users and I don't have enough resources for a
development environment. The only other variables that changed before the
crashes was the group metadata-cache[0], which I enabled the same day as
the parallel-readdir and readdir-ahead options:

$ gluster volume set homes group metadata-cache

I'm hoping Atin or Poornima can shed some light and squash this bug.

[0]
https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md

Regards,

On Fri, Jan 26, 2018 at 6:10 AM Vlad Kopylov <vladkopy at gmail.com>
wrote:
> can you please test parallel-readdir or readdir-ahead gives
> disconnects? so we know which to disable
>
> parallel-readdir doing magic ran on pdf from last year
>
>
https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf
>
> -v
>
> On Thu, Jan 25, 2018 at 8:20 AM, Alan Orth <alan.orth at gmail.com>
wrote:
> > By the way, on a slightly related note, I'm pretty sure either
> > parallel-readdir or readdir-ahead has a regression in GlusterFS
3.12.x.
> We
> > are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6.
> >
> > I updated my servers and clients to 3.12.4 and enabled these two
options
> > after reading about them in the 3.10.0 and 3.11.0 release notes. In
the
> days
> > after enabling these two options all of my clients kept getting
> disconnected
> > from the volume. The error upon attempting to list a directory or read
a
> > file was "Transport endpoint is not connected", after which
I would force
> > unmount the volume with `umount -fl /home` and remount it, only to
have
> it
> > get disconnected again a few hours later.
> >
> > Every time the volume disconnected I looked in the client mount log
and
> only
> > found information such as:
> >
> > [2018-01-24 05:52:27.695225] I [MSGID: 108026]
> > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:
> > Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168.
> > sources=[0]  sinks=1
> > [2018-01-24 05:52:27.700611] I [MSGID: 108026]
> > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> > 2-homes-replicate-1: performing metadata selfheal on
> > b6a53629-a831-4ee3-a35e-f47c04297aaa
> > [2018-01-24 05:52:27.703021] I [MSGID: 108026]
> > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:
> > Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa.
> > sources=[0]  sinks=1
> >
> > I enabled debug logging for that volume's client mount with
`gluster
> volume
> > set homes diagnostics.client-log-level DEBUG` and then I saw this in
the
> > client mount log the next time it disconnected:
> >
> > [2018-01-24 08:55:19.138810] D [MSGID: 0]
[io-threads.c:358:iot_schedule]
> > 0-homes-io-threads: LOOKUP scheduled as fast fop
> > [2018-01-24 08:55:19.138849] D [MSGID: 0]
[dht-common.c:2711:dht_lookup]
> > 0-homes-dht: Calling fresh lookup for
> > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on
> > homes-readdir-ahead-1
> > [2018-01-24 08:55:19.138928] D [MSGID: 0]
[io-threads.c:358:iot_schedule]
> > 0-homes-io-threads: FSTAT scheduled as fast fop
> > [2018-01-24 08:55:19.138958] D [MSGID: 0]
> [afr-read-txn.c:220:afr_read_txn]
> > 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation
> now vs
> > cached: 2, 2
> > [2018-01-24 08:55:19.139187] D [MSGID: 0]
> [dht-common.c:2294:dht_lookup_cbk]
> > 0-homes-dht: fresh_lookup returned for
> > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0
> > [2018-01-24 08:55:19.139200] D [MSGID: 0]
> > [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file > >
00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1
> > [2018-01-24 08:55:19.139257] D [MSGID: 0]
[io-threads.c:358:iot_schedule]
> > 0-homes-io-threads: READDIRP scheduled as fast fop
> >
> > On a hunch I disabled both parallel-readdir and readdir-ahead, which I
> had
> > only enabled a few days before, and now all of the clients are much
more
> > stable, with zero disconnections in the days since I disabled those
two
> > volume options.
> >
> > Please take a look! Thanks,
> >
> > On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj at
redhat.com>
> wrote:
> >>
> >> Adding Poornima to take a look at it and comment.
> >>
> >> On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth at
gmail.com>
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I saw that parallel-readdir was an experimental feature in
GlusterFS
> >>> version 3.10.0, became stable in version 3.11.0, and is now
> recommended for
> >>> small file workloads in the Red Hat Gluster Storage Server
> documentation[2].
> >>> I've successfully enabled this on one of my volumes but I
notice the
> >>> following in the client mount log:
> >>>
> >>> [2018-01-23 10:24:24.048055] W [MSGID: 101174]
> >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1:
option
> >>> 'parallel-readdir' is not recognized
> >>> [2018-01-23 10:24:24.048072] W [MSGID: 101174]
> >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0:
option
> >>> 'parallel-readdir' is not recognized
> >>>
> >>> The GlusterFS version on the client and server is 3.12.4. What
is going
> >>> on?
> >>>
> >>> [0]
> >>>
>
https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md
> >>> [1]
> >>>
>
https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md
> >>> [2]
> >>>
>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements
> >>>
> >>> Thank you,
> >>>
> >>>
> >>> --
> >>>
> >>> Alan Orth
> >>> alan.orth at gmail.com
> >>> https://picturingjordan.com
> >>> https://englishbulgaria.net
> >>> https://mjanja.ch
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>
> >>
> > --
> >
> > Alan Orth
> > alan.orth at gmail.com
> > https://picturingjordan.com
> > https://englishbulgaria.net
> > https://mjanja.ch
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>-- 

Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180126/dfd7e140/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

Gluster users - Jan 2018 - parallel-readdir is not recognized in GlusterFS 3.12.4

[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4

[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4

[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4

Seemingly Similar Threads