Alan Orth
2018-Jan-25 13:20 UTC
[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4
By the way, on a slightly related note, I'm pretty sure either parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. We are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6. I updated my servers and clients to 3.12.4 and enabled these two options after reading about them in the 3.10.0 and 3.11.0 release notes. In the days after enabling these two options all of my clients kept getting disconnected from the volume. The error upon attempting to list a directory or read a file was "Transport endpoint is not connected", after which I would force unmount the volume with `umount -fl /home` and remount it, only to have it get disconnected again a few hours later. Every time the volume disconnected I looked in the client mount log and only found information such as: [2018-01-24 05:52:27.695225] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168. sources=[0] sinks=1 [2018-01-24 05:52:27.700611] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 2-homes-replicate-1: performing metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa [2018-01-24 05:52:27.703021] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa. sources=[0] sinks=1 I enabled debug logging for that volume's client mount with `gluster volume set homes diagnostics.client-log-level DEBUG` and then I saw this in the client mount log the next time it disconnected: [2018-01-24 08:55:19.138810] D [MSGID: 0] [io-threads.c:358:iot_schedule] 0-homes-io-threads: LOOKUP scheduled as fast fop [2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup] 0-homes-dht: Calling fresh lookup for /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on homes-readdir-ahead-1 [2018-01-24 08:55:19.138928] D [MSGID: 0] [io-threads.c:358:iot_schedule] 0-homes-io-threads: FSTAT scheduled as fast fop [2018-01-24 08:55:19.138958] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation now vs cached: 2, 2 [2018-01-24 08:55:19.139187] D [MSGID: 0] [dht-common.c:2294:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0 [2018-01-24 08:55:19.139200] D [MSGID: 0] [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file 00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1 [2018-01-24 08:55:19.139257] D [MSGID: 0] [io-threads.c:358:iot_schedule] 0-homes-io-threads: READDIRP scheduled as fast fop On a hunch I disabled both parallel-readdir and readdir-ahead, which I had only enabled a few days before, and now all of the clients are much more stable, with zero disconnections in the days since I disabled those two volume options. Please take a look! Thanks, On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj at redhat.com> wrote:> Adding Poornima to take a look at it and comment. > > On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth at gmail.com> wrote: > >> Hello, >> >> I saw that parallel-readdir was an experimental feature in GlusterFS >> version 3.10.0, became stable in version 3.11.0, and is now recommended for >> small file workloads in the Red Hat Gluster Storage Server >> documentation[2]. I've successfully enabled this on one of my volumes but I >> notice the following in the client mount log: >> >> [2018-01-23 10:24:24.048055] W [MSGID: 101174] >> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1: option >> 'parallel-readdir' is not recognized >> [2018-01-23 10:24:24.048072] W [MSGID: 101174] >> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0: option >> 'parallel-readdir' is not recognized >> >> The GlusterFS version on the client and server is 3.12.4. What is going >> on? >> >> [0] >> https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md >> [1] >> https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md >> [2] >> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements >> >> Thank you, >> >> >> -- >> >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > --Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/c9f2c8fa/attachment.html>
Vlad Kopylov
2018-Jan-26 04:10 UTC
[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4
can you please test parallel-readdir or readdir-ahead gives disconnects? so we know which to disable parallel-readdir doing magic ran on pdf from last year https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf -v On Thu, Jan 25, 2018 at 8:20 AM, Alan Orth <alan.orth at gmail.com> wrote:> By the way, on a slightly related note, I'm pretty sure either > parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. We > are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6. > > I updated my servers and clients to 3.12.4 and enabled these two options > after reading about them in the 3.10.0 and 3.11.0 release notes. In the days > after enabling these two options all of my clients kept getting disconnected > from the volume. The error upon attempting to list a directory or read a > file was "Transport endpoint is not connected", after which I would force > unmount the volume with `umount -fl /home` and remount it, only to have it > get disconnected again a few hours later. > > Every time the volume disconnected I looked in the client mount log and only > found information such as: > > [2018-01-24 05:52:27.695225] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: > Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168. > sources=[0] sinks=1 > [2018-01-24 05:52:27.700611] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 2-homes-replicate-1: performing metadata selfheal on > b6a53629-a831-4ee3-a35e-f47c04297aaa > [2018-01-24 05:52:27.703021] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: > Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa. > sources=[0] sinks=1 > > I enabled debug logging for that volume's client mount with `gluster volume > set homes diagnostics.client-log-level DEBUG` and then I saw this in the > client mount log the next time it disconnected: > > [2018-01-24 08:55:19.138810] D [MSGID: 0] [io-threads.c:358:iot_schedule] > 0-homes-io-threads: LOOKUP scheduled as fast fop > [2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup] > 0-homes-dht: Calling fresh lookup for > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on > homes-readdir-ahead-1 > [2018-01-24 08:55:19.138928] D [MSGID: 0] [io-threads.c:358:iot_schedule] > 0-homes-io-threads: FSTAT scheduled as fast fop > [2018-01-24 08:55:19.138958] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] > 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation now vs > cached: 2, 2 > [2018-01-24 08:55:19.139187] D [MSGID: 0] [dht-common.c:2294:dht_lookup_cbk] > 0-homes-dht: fresh_lookup returned for > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0 > [2018-01-24 08:55:19.139200] D [MSGID: 0] > [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file > 00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1 > [2018-01-24 08:55:19.139257] D [MSGID: 0] [io-threads.c:358:iot_schedule] > 0-homes-io-threads: READDIRP scheduled as fast fop > > On a hunch I disabled both parallel-readdir and readdir-ahead, which I had > only enabled a few days before, and now all of the clients are much more > stable, with zero disconnections in the days since I disabled those two > volume options. > > Please take a look! Thanks, > > On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj at redhat.com> wrote: >> >> Adding Poornima to take a look at it and comment. >> >> On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth at gmail.com> wrote: >>> >>> Hello, >>> >>> I saw that parallel-readdir was an experimental feature in GlusterFS >>> version 3.10.0, became stable in version 3.11.0, and is now recommended for >>> small file workloads in the Red Hat Gluster Storage Server documentation[2]. >>> I've successfully enabled this on one of my volumes but I notice the >>> following in the client mount log: >>> >>> [2018-01-23 10:24:24.048055] W [MSGID: 101174] >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1: option >>> 'parallel-readdir' is not recognized >>> [2018-01-23 10:24:24.048072] W [MSGID: 101174] >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0: option >>> 'parallel-readdir' is not recognized >>> >>> The GlusterFS version on the client and server is 3.12.4. What is going >>> on? >>> >>> [0] >>> https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md >>> [1] >>> https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md >>> [2] >>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements >>> >>> Thank you, >>> >>> >>> -- >>> >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> > -- > > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Alan Orth
2018-Jan-26 11:59 UTC
[Gluster-users] parallel-readdir is not recognized in GlusterFS 3.12.4
Dear Vlad, I'm sorry, I don't want to test this again on my system just yet! It caused too much instability for my users and I don't have enough resources for a development environment. The only other variables that changed before the crashes was the group metadata-cache[0], which I enabled the same day as the parallel-readdir and readdir-ahead options: $ gluster volume set homes group metadata-cache I'm hoping Atin or Poornima can shed some light and squash this bug. [0] https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md Regards, On Fri, Jan 26, 2018 at 6:10 AM Vlad Kopylov <vladkopy at gmail.com> wrote:> can you please test parallel-readdir or readdir-ahead gives > disconnects? so we know which to disable > > parallel-readdir doing magic ran on pdf from last year > > https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf > > -v > > On Thu, Jan 25, 2018 at 8:20 AM, Alan Orth <alan.orth at gmail.com> wrote: > > By the way, on a slightly related note, I'm pretty sure either > > parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. > We > > are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6. > > > > I updated my servers and clients to 3.12.4 and enabled these two options > > after reading about them in the 3.10.0 and 3.11.0 release notes. In the > days > > after enabling these two options all of my clients kept getting > disconnected > > from the volume. The error upon attempting to list a directory or read a > > file was "Transport endpoint is not connected", after which I would force > > unmount the volume with `umount -fl /home` and remount it, only to have > it > > get disconnected again a few hours later. > > > > Every time the volume disconnected I looked in the client mount log and > only > > found information such as: > > > > [2018-01-24 05:52:27.695225] I [MSGID: 108026] > > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: > > Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168. > > sources=[0] sinks=1 > > [2018-01-24 05:52:27.700611] I [MSGID: 108026] > > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > > 2-homes-replicate-1: performing metadata selfheal on > > b6a53629-a831-4ee3-a35e-f47c04297aaa > > [2018-01-24 05:52:27.703021] I [MSGID: 108026] > > [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1: > > Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa. > > sources=[0] sinks=1 > > > > I enabled debug logging for that volume's client mount with `gluster > volume > > set homes diagnostics.client-log-level DEBUG` and then I saw this in the > > client mount log the next time it disconnected: > > > > [2018-01-24 08:55:19.138810] D [MSGID: 0] [io-threads.c:358:iot_schedule] > > 0-homes-io-threads: LOOKUP scheduled as fast fop > > [2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup] > > 0-homes-dht: Calling fresh lookup for > > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on > > homes-readdir-ahead-1 > > [2018-01-24 08:55:19.138928] D [MSGID: 0] [io-threads.c:358:iot_schedule] > > 0-homes-io-threads: FSTAT scheduled as fast fop > > [2018-01-24 08:55:19.138958] D [MSGID: 0] > [afr-read-txn.c:220:afr_read_txn] > > 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation > now vs > > cached: 2, 2 > > [2018-01-24 08:55:19.139187] D [MSGID: 0] > [dht-common.c:2294:dht_lookup_cbk] > > 0-homes-dht: fresh_lookup returned for > > /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0 > > [2018-01-24 08:55:19.139200] D [MSGID: 0] > > [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file > > 00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1 > > [2018-01-24 08:55:19.139257] D [MSGID: 0] [io-threads.c:358:iot_schedule] > > 0-homes-io-threads: READDIRP scheduled as fast fop > > > > On a hunch I disabled both parallel-readdir and readdir-ahead, which I > had > > only enabled a few days before, and now all of the clients are much more > > stable, with zero disconnections in the days since I disabled those two > > volume options. > > > > Please take a look! Thanks, > > > > On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee <amukherj at redhat.com> > wrote: > >> > >> Adding Poornima to take a look at it and comment. > >> > >> On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth <alan.orth at gmail.com> > wrote: > >>> > >>> Hello, > >>> > >>> I saw that parallel-readdir was an experimental feature in GlusterFS > >>> version 3.10.0, became stable in version 3.11.0, and is now > recommended for > >>> small file workloads in the Red Hat Gluster Storage Server > documentation[2]. > >>> I've successfully enabled this on one of my volumes but I notice the > >>> following in the client mount log: > >>> > >>> [2018-01-23 10:24:24.048055] W [MSGID: 101174] > >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1: option > >>> 'parallel-readdir' is not recognized > >>> [2018-01-23 10:24:24.048072] W [MSGID: 101174] > >>> [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0: option > >>> 'parallel-readdir' is not recognized > >>> > >>> The GlusterFS version on the client and server is 3.12.4. What is going > >>> on? > >>> > >>> [0] > >>> > https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md > >>> [1] > >>> > https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md > >>> [2] > >>> > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements > >>> > >>> Thank you, > >>> > >>> > >>> -- > >>> > >>> Alan Orth > >>> alan.orth at gmail.com > >>> https://picturingjordan.com > >>> https://englishbulgaria.net > >>> https://mjanja.ch > >>> > >>> > >>> _______________________________________________ > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> > >> > > -- > > > > Alan Orth > > alan.orth at gmail.com > > https://picturingjordan.com > > https://englishbulgaria.net > > https://mjanja.ch > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180126/dfd7e140/attachment.html>
Seemingly Similar Threads
- parallel-readdir is not recognized in GlusterFS 3.12.4
- parallel-readdir is not recognized in GlusterFS 3.12.4
- parallel-readdir is not recognized in GlusterFS 3.12.4
- parallel-readdir is not recognized in GlusterFS 3.12.4
- parallel-readdir is not recognized in GlusterFS 3.12.4