Richard Neuboeck
2018-Sep-13 08:07 UTC
[Gluster-users] gluster connection interrupted during transfer
Hi, I've created excerpts from the brick and client logs +/- 1 minute to the kill event. Still the logs are ~400-500MB so will put them somewhere to download since I have no idea what I should be looking for and skimming them didn't reveal obvious problems to me. http://www.tbi.univie.ac.at/~hawk/gluster/brick_3min_excerpt.log http://www.tbi.univie.ac.at/~hawk/gluster/mnt_3min_excerpt.log I was pointed in the direction of the following Bugreport https://bugzilla.redhat.com/show_bug.cgi?id=1613512 It sounds right but seems to have been addressed already. If there is anything I can do to help solve this problem please let me know. Thanks for your help! Cheers Richard On 9/11/18 10:10 AM, Richard Neuboeck wrote:> Hi, > > since I feared that the logs would fill up the partition (again) I > checked the systems daily and finally found the reason. The glusterfs > process on the client runs out of memory and get's killed by OOM after > about four days. Since rsync runs for a couple of days longer till it > ends I never checked the whole time frame in the system logs and never > stumbled upon the OOM message. > > Running out of memory on a 128GB RAM system even with a DB occupying > ~40% of that is kind of strange though. Might there be a leak? > > But this would explain the erratic behavior I've experienced over the > last 1.5 years while trying to work with our homes on glusterfs. > > Here is the kernel log message for the killed glusterfs process. > https://gist.github.com/bleuchien/3d2b87985ecb944c60347d5e8660e36a > > I'm checking the brick and client trace logs. But those are respectively > 1TB and 2TB in size so searching in them takes a while. I'll be creating > gists for both logs about the time when the process died. > > As soon as I have more details I'll post them. > > Here you can see a graphical representation of the memory usage of this > system: https://imgur.com/a/4BINtfr > > Cheers > Richard > > > > On 31.08.18 08:13, Raghavendra Gowdappa wrote: >> >> >> On Fri, Aug 31, 2018 at 11:11 AM, Richard Neuboeck >> <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>> wrote: >> >> On 08/31/2018 03:50 AM, Raghavendra Gowdappa wrote: >> > +Mohit. +Milind >> > >> > @Mohit/Milind, >> > >> > Can you check logs and see whether you can find anything relevant? >> >> From glances at the system logs nothing out of the ordinary >> occurred. However I'll start another rsync and take a closer look. >> It will take a few days. >> >> > >> > On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck >> > <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at> >> <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>>> wrote: >> > >> >? ? ?Hi, >> > >> >? ? ?I'm attaching a shortened version since the whole is about 5.8GB of >> >? ? ?the client mount log. It includes the initial mount messages and the >> >? ? ?last two minutes of log entries. >> > >> >? ? ?It ends very anticlimactic without an obvious error. Is there >> >? ? ?anything specific I should be looking for? >> > >> > >> > Normally I look logs around disconnect msgs to find out the reason. >> > But as you said, sometimes one can see just disconnect msgs without >> > any reason. That normally points to reason for disconnect in the >> > network rather than a Glusterfs initiated disconnect. >> >> The rsync source is serving our homes currently so there are NFS >> connections 24/7. There don't seem to be any network related >> interruptions >> >> >> Can you set diagnostics.client-log-level and diagnostics.brick-log-level >> to TRACE and check logs of both ends of connections - client and brick? >> To reduce the logsize, I would suggest to logrotate existing logs and >> start with fresh logs when you are about to start so that only relevant >> logs are captured. Also, can you take strace of client and brick process >> using: >> >> strace -o <outputfile> -ff -v -p <pid> >> >> attach both logs and strace. Let's trace through what syscalls on socket >> return and then decide whether to inspect tcpdump or not. If you don't >> want to repeat tests again, please capture tcpdump too (on both ends of >> connection) and send them to us. >> >> >> - a co-worker would be here faster than I could check >> the logs if the connection to home would be broken ;-) >> The three gluster machines are due to this problem reduced to only >> testing so there is nothing else running. >> >> >> > >> >? ? ?Cheers >> >? ? ?Richard >> > >> >? ? ?On 08/30/2018 02:40 PM, Raghavendra Gowdappa wrote: >> >? ? ?> Normally client logs will give a clue on why the disconnections are >> >? ? ?> happening (ping-timeout, wrong port etc). Can you look into client >> >? ? ?> logs to figure out what's happening? If you can't find anything, can >> >? ? ?> you send across client logs? >> >? ? ?> >> >? ? ?> On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck >> >? ? ?> <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at> >> <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>> >> >? ? ?<mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at> >> <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>>>> >> >? ? ?wrote: >> >? ? ?> >> >? ? ?>? ? ?Hi Gluster Community, >> >? ? ?> >> >? ? ?>? ? ?I have problems with a glusterfs 'Transport endpoint not >> >? ? ?connected' >> >? ? ?>? ? ?connection abort during file transfers that I can >> >? ? ?replicate (all the >> >? ? ?>? ? ?time now) but not pinpoint as to why this is happening. >> >? ? ?> >> >? ? ?>? ? ?The volume is set up in replica 3 mode and accessed with >> >? ? ?the fuse >> >? ? ?>? ? ?gluster client. Both client and server are running CentOS >> >? ? ?and the >> >? ? ?>? ? ?supplied 3.12.11 version of gluster. >> >? ? ?> >> >? ? ?>? ? ?The connection abort happens at different times during >> >? ? ?rsync but >> >? ? ?>? ? ?occurs every time I try to sync all our files (1.1TB) to >> >? ? ?the empty >> >? ? ?>? ? ?volume. >> >? ? ?> >> >? ? ?>? ? ?Client and server side I don't find errors in the gluster >> >? ? ?log files. >> >? ? ?>? ? ?rsync logs the obvious transfer problem. The only log that >> >? ? ?shows >> >? ? ?>? ? ?anything related is the server brick log which states >> that the >> >? ? ?>? ? ?connection is shutting down: >> >? ? ?> >> >? ? ?>? ? ?[2018-08-18 22:40:35.502510] I [MSGID: 115036] >> >? ? ?>? ? ?[server.c:527:server_rpc_notify] 0-home-server: >> disconnecting >> >? ? ?>? ? ?connection from >> >? ? ?>? ? ?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 >> >? ? ?>? ? ?[2018-08-18 22:40:35.502620] W >> >? ? ?>? ? ?[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: >> >? ? ?releasing lock >> >? ? ?>? ? ?on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by >> >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423 >> lk-owner=d0fd5ffb427f0000} >> >? ? ?>? ? ?[2018-08-18 22:40:35.502692] W >> >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: >> >? ? ?releasing lock >> >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by >> >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423 >> lk-owner=703dd4cc407f0000} >> >? ? ?>? ? ?[2018-08-18 22:40:35.502719] W >> >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: >> >? ? ?releasing lock >> >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by >> >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423 >> lk-owner=703dd4cc407f0000} >> >? ? ?>? ? ?[2018-08-18 22:40:35.505950] I [MSGID: 101055] >> >? ? ?>? ? ?[client_t.c:443:gf_client_unref] 0-home-server: Shutting >> down >> >? ? ?>? ? ?connection >> >? ? ?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 >> >? ? ?> >> >? ? ?>? ? ?Since I'm running another replica 3 setup for oVirt for a >> >? ? ?long time >> >? ? ?>? ? ?now which is completely stable I thought I made a mistake >> >? ? ?setting >> >? ? ?>? ? ?different options at first. However even when I reset >> >? ? ?those options >> >? ? ?>? ? ?I'm able to reproduce the connection problem. >> >? ? ?> >> >? ? ?>? ? ?The unoptimized volume setup looks like this: >> >? ? ?> >> >? ? ?>? ? ?Volume Name: home >> >? ? ?>? ? ?Type: Replicate >> >? ? ?>? ? ?Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8 >> >? ? ?>? ? ?Status: Started >> >? ? ?>? ? ?Snapshot Count: 0 >> >? ? ?>? ? ?Number of Bricks: 1 x 3 = 3 >> >? ? ?>? ? ?Transport-type: tcp >> >? ? ?>? ? ?Bricks: >> >? ? ?>? ? ?Brick1: sphere-four:/srv/gluster_home/brick >> >? ? ?>? ? ?Brick2: sphere-five:/srv/gluster_home/brick >> >? ? ?>? ? ?Brick3: sphere-six:/srv/gluster_home/brick >> >? ? ?>? ? ?Options Reconfigured: >> >? ? ?>? ? ?nfs.disable: on >> >? ? ?>? ? ?transport.address-family: inet >> >? ? ?>? ? ?cluster.quorum-type: auto >> >? ? ?>? ? ?cluster.server-quorum-type: server >> >? ? ?>? ? ?cluster.server-quorum-ratio: 50% >> >? ? ?> >> >? ? ?> >> >? ? ?>? ? ?The following additional options were used before: >> >? ? ?> >> >? ? ?>? ? ?performance.cache-size: 5GB >> >? ? ?>? ? ?client.event-threads: 4 >> >? ? ?>? ? ?server.event-threads: 4 >> >? ? ?>? ? ?cluster.lookup-optimize: on >> >? ? ?>? ? ?features.cache-invalidation: on >> >? ? ?>? ? ?performance.stat-prefetch: on >> >? ? ?>? ? ?performance.cache-invalidation: on >> >? ? ?>? ? ?network.inode-lru-limit: 50000 >> >? ? ?>? ? ?features.cache-invalidation-timeout: 600 >> >? ? ?>? ? ?performance.md-cache-timeout: 600 >> >? ? ?>? ? ?performance.parallel-readdir: on >> >? ? ?> >> >? ? ?> >> >? ? ?>? ? ?In this case the gluster servers and also the client is >> >? ? ?using a >> >? ? ?>? ? ?bonded network device running in adaptive load balancing >> mode. >> >? ? ?> >> >? ? ?>? ? ?I've tried using the debug option for the client mount. >> >? ? ?But except >> >? ? ?>? ? ?for a ~0.5TB log file I didn't get information that seems >> >? ? ?>? ? ?helpful to me. >> >? ? ?> >> >? ? ?>? ? ?Transferring just a couple of GB works without problems. >> >? ? ?> >> >? ? ?>? ? ?It may very well be that I'm already blind to the obvious >> >? ? ?but after >> >? ? ?>? ? ?many long running tests I can't find the crux in the setup. >> >? ? ?> >> >? ? ?>? ? ?Does anyone have an idea as how to approach this problem >> >? ? ?in a way >> >? ? ?>? ? ?that sheds some useful information? >> >? ? ?> >> >? ? ?>? ? ?Any help is highly appreciated! >> >? ? ?>? ? ?Cheers >> >? ? ?>? ? ?Richard >> >? ? ?> >> >? ? ?>? ? ?-- >> >? ? ?>? ? ?/dev/null >> >? ? ?> >> >? ? ?> >> >? ? ?> >> >? ? ?> >> >? ? ?>? ? ?_______________________________________________ >> >? ? ?>? ? ?Gluster-users mailing list >> >? ? ?>? ? ?Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> >? ? ?<mailto:Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org>> >> >? ? ?<mailto:Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org> >> >? ? ?<mailto:Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org>>> >> >? ? ?>? ? ?https://lists.gluster.org/mailman/listinfo/gluster-users >> <https://lists.gluster.org/mailman/listinfo/gluster-users> >> >? ? ?<https://lists.gluster.org/mailman/listinfo/gluster-users >> <https://lists.gluster.org/mailman/listinfo/gluster-users>> >> >? ? ?>? ? >> ?<https://lists.gluster.org/mailman/listinfo/gluster-users >> <https://lists.gluster.org/mailman/listinfo/gluster-users> >> >? ? ?<https://lists.gluster.org/mailman/listinfo/gluster-users >> <https://lists.gluster.org/mailman/listinfo/gluster-users>>> >> >? ? ?> >> >? ? ?> >> > >> > >> >? ? ?-- >> >? ? ?/dev/null >> > >> > >> >> >> -- >> /dev/null >> >> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- /dev/null -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180913/bc58eaf9/attachment.sig>
Richard Neuboeck
2018-Sep-21 07:14 UTC
[Gluster-users] gluster connection interrupted during transfer
Hi again, in my limited - non full time programmer - understanding it's a memory leak in the gluster fuse client. Should I reopen the mentioned bugreport or open a new one? Or would the community prefer an entirely different approach? Thanks Richard On 13.09.18 10:07, Richard Neuboeck wrote:> Hi, > > I've created excerpts from the brick and client logs +/- 1 minute to > the kill event. Still the logs are ~400-500MB so will put them > somewhere to download since I have no idea what I should be looking > for and skimming them didn't reveal obvious problems to me. > > http://www.tbi.univie.ac.at/~hawk/gluster/brick_3min_excerpt.log > http://www.tbi.univie.ac.at/~hawk/gluster/mnt_3min_excerpt.log > > I was pointed in the direction of the following Bugreport > https://bugzilla.redhat.com/show_bug.cgi?id=1613512 > It sounds right but seems to have been addressed already. > > If there is anything I can do to help solve this problem please let > me know. Thanks for your help! > > Cheers > Richard > > > On 9/11/18 10:10 AM, Richard Neuboeck wrote: >> Hi, >> >> since I feared that the logs would fill up the partition (again) I >> checked the systems daily and finally found the reason. The glusterfs >> process on the client runs out of memory and get's killed by OOM after >> about four days. Since rsync runs for a couple of days longer till it >> ends I never checked the whole time frame in the system logs and never >> stumbled upon the OOM message. >> >> Running out of memory on a 128GB RAM system even with a DB occupying >> ~40% of that is kind of strange though. Might there be a leak? >> >> But this would explain the erratic behavior I've experienced over the >> last 1.5 years while trying to work with our homes on glusterfs. >> >> Here is the kernel log message for the killed glusterfs process. >> https://gist.github.com/bleuchien/3d2b87985ecb944c60347d5e8660e36a >> >> I'm checking the brick and client trace logs. But those are respectively >> 1TB and 2TB in size so searching in them takes a while. I'll be creating >> gists for both logs about the time when the process died. >> >> As soon as I have more details I'll post them. >> >> Here you can see a graphical representation of the memory usage of this >> system: https://imgur.com/a/4BINtfr >> >> Cheers >> Richard >> >> >> >> On 31.08.18 08:13, Raghavendra Gowdappa wrote: >>> >>> >>> On Fri, Aug 31, 2018 at 11:11 AM, Richard Neuboeck >>> <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>> wrote: >>> >>> On 08/31/2018 03:50 AM, Raghavendra Gowdappa wrote: >>> > +Mohit. +Milind >>> > >>> > @Mohit/Milind, >>> > >>> > Can you check logs and see whether you can find anything relevant? >>> >>> From glances at the system logs nothing out of the ordinary >>> occurred. However I'll start another rsync and take a closer look. >>> It will take a few days. >>> >>> > >>> > On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck >>> > <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at> >>> <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>>> wrote: >>> > >>> >? ? ?Hi, >>> > >>> >? ? ?I'm attaching a shortened version since the whole is about 5.8GB of >>> >? ? ?the client mount log. It includes the initial mount messages and the >>> >? ? ?last two minutes of log entries. >>> > >>> >? ? ?It ends very anticlimactic without an obvious error. Is there >>> >? ? ?anything specific I should be looking for? >>> > >>> > >>> > Normally I look logs around disconnect msgs to find out the reason. >>> > But as you said, sometimes one can see just disconnect msgs without >>> > any reason. That normally points to reason for disconnect in the >>> > network rather than a Glusterfs initiated disconnect. >>> >>> The rsync source is serving our homes currently so there are NFS >>> connections 24/7. There don't seem to be any network related >>> interruptions >>> >>> >>> Can you set diagnostics.client-log-level and diagnostics.brick-log-level >>> to TRACE and check logs of both ends of connections - client and brick? >>> To reduce the logsize, I would suggest to logrotate existing logs and >>> start with fresh logs when you are about to start so that only relevant >>> logs are captured. Also, can you take strace of client and brick process >>> using: >>> >>> strace -o <outputfile> -ff -v -p <pid> >>> >>> attach both logs and strace. Let's trace through what syscalls on socket >>> return and then decide whether to inspect tcpdump or not. If you don't >>> want to repeat tests again, please capture tcpdump too (on both ends of >>> connection) and send them to us. >>> >>> >>> - a co-worker would be here faster than I could check >>> the logs if the connection to home would be broken ;-) >>> The three gluster machines are due to this problem reduced to only >>> testing so there is nothing else running. >>> >>> >>> > >>> >? ? ?Cheers >>> >? ? ?Richard >>> > >>> >? ? ?On 08/30/2018 02:40 PM, Raghavendra Gowdappa wrote: >>> >? ? ?> Normally client logs will give a clue on why the disconnections are >>> >? ? ?> happening (ping-timeout, wrong port etc). Can you look into client >>> >? ? ?> logs to figure out what's happening? If you can't find anything, can >>> >? ? ?> you send across client logs? >>> >? ? ?> >>> >? ? ?> On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck >>> >? ? ?> <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at> >>> <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>> >>> >? ? ?<mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at> >>> <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>>>> >>> >? ? ?wrote: >>> >? ? ?> >>> >? ? ?>? ? ?Hi Gluster Community, >>> >? ? ?> >>> >? ? ?>? ? ?I have problems with a glusterfs 'Transport endpoint not >>> >? ? ?connected' >>> >? ? ?>? ? ?connection abort during file transfers that I can >>> >? ? ?replicate (all the >>> >? ? ?>? ? ?time now) but not pinpoint as to why this is happening. >>> >? ? ?> >>> >? ? ?>? ? ?The volume is set up in replica 3 mode and accessed with >>> >? ? ?the fuse >>> >? ? ?>? ? ?gluster client. Both client and server are running CentOS >>> >? ? ?and the >>> >? ? ?>? ? ?supplied 3.12.11 version of gluster. >>> >? ? ?> >>> >? ? ?>? ? ?The connection abort happens at different times during >>> >? ? ?rsync but >>> >? ? ?>? ? ?occurs every time I try to sync all our files (1.1TB) to >>> >? ? ?the empty >>> >? ? ?>? ? ?volume. >>> >? ? ?> >>> >? ? ?>? ? ?Client and server side I don't find errors in the gluster >>> >? ? ?log files. >>> >? ? ?>? ? ?rsync logs the obvious transfer problem. The only log that >>> >? ? ?shows >>> >? ? ?>? ? ?anything related is the server brick log which states >>> that the >>> >? ? ?>? ? ?connection is shutting down: >>> >? ? ?> >>> >? ? ?>? ? ?[2018-08-18 22:40:35.502510] I [MSGID: 115036] >>> >? ? ?>? ? ?[server.c:527:server_rpc_notify] 0-home-server: >>> disconnecting >>> >? ? ?>? ? ?connection from >>> >? ? ?>? ? ?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 >>> >? ? ?>? ? ?[2018-08-18 22:40:35.502620] W >>> >? ? ?>? ? ?[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: >>> >? ? ?releasing lock >>> >? ? ?>? ? ?on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by >>> >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423 >>> lk-owner=d0fd5ffb427f0000} >>> >? ? ?>? ? ?[2018-08-18 22:40:35.502692] W >>> >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: >>> >? ? ?releasing lock >>> >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by >>> >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423 >>> lk-owner=703dd4cc407f0000} >>> >? ? ?>? ? ?[2018-08-18 22:40:35.502719] W >>> >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: >>> >? ? ?releasing lock >>> >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by >>> >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423 >>> lk-owner=703dd4cc407f0000} >>> >? ? ?>? ? ?[2018-08-18 22:40:35.505950] I [MSGID: 101055] >>> >? ? ?>? ? ?[client_t.c:443:gf_client_unref] 0-home-server: Shutting >>> down >>> >? ? ?>? ? ?connection >>> >? ? ?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0 >>> >? ? ?> >>> >? ? ?>? ? ?Since I'm running another replica 3 setup for oVirt for a >>> >? ? ?long time >>> >? ? ?>? ? ?now which is completely stable I thought I made a mistake >>> >? ? ?setting >>> >? ? ?>? ? ?different options at first. However even when I reset >>> >? ? ?those options >>> >? ? ?>? ? ?I'm able to reproduce the connection problem. >>> >? ? ?> >>> >? ? ?>? ? ?The unoptimized volume setup looks like this: >>> >? ? ?> >>> >? ? ?>? ? ?Volume Name: home >>> >? ? ?>? ? ?Type: Replicate >>> >? ? ?>? ? ?Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8 >>> >? ? ?>? ? ?Status: Started >>> >? ? ?>? ? ?Snapshot Count: 0 >>> >? ? ?>? ? ?Number of Bricks: 1 x 3 = 3 >>> >? ? ?>? ? ?Transport-type: tcp >>> >? ? ?>? ? ?Bricks: >>> >? ? ?>? ? ?Brick1: sphere-four:/srv/gluster_home/brick >>> >? ? ?>? ? ?Brick2: sphere-five:/srv/gluster_home/brick >>> >? ? ?>? ? ?Brick3: sphere-six:/srv/gluster_home/brick >>> >? ? ?>? ? ?Options Reconfigured: >>> >? ? ?>? ? ?nfs.disable: on >>> >? ? ?>? ? ?transport.address-family: inet >>> >? ? ?>? ? ?cluster.quorum-type: auto >>> >? ? ?>? ? ?cluster.server-quorum-type: server >>> >? ? ?>? ? ?cluster.server-quorum-ratio: 50% >>> >? ? ?> >>> >? ? ?> >>> >? ? ?>? ? ?The following additional options were used before: >>> >? ? ?> >>> >? ? ?>? ? ?performance.cache-size: 5GB >>> >? ? ?>? ? ?client.event-threads: 4 >>> >? ? ?>? ? ?server.event-threads: 4 >>> >? ? ?>? ? ?cluster.lookup-optimize: on >>> >? ? ?>? ? ?features.cache-invalidation: on >>> >? ? ?>? ? ?performance.stat-prefetch: on >>> >? ? ?>? ? ?performance.cache-invalidation: on >>> >? ? ?>? ? ?network.inode-lru-limit: 50000 >>> >? ? ?>? ? ?features.cache-invalidation-timeout: 600 >>> >? ? ?>? ? ?performance.md-cache-timeout: 600 >>> >? ? ?>? ? ?performance.parallel-readdir: on >>> >? ? ?> >>> >? ? ?> >>> >? ? ?>? ? ?In this case the gluster servers and also the client is >>> >? ? ?using a >>> >? ? ?>? ? ?bonded network device running in adaptive load balancing >>> mode. >>> >? ? ?> >>> >? ? ?>? ? ?I've tried using the debug option for the client mount. >>> >? ? ?But except >>> >? ? ?>? ? ?for a ~0.5TB log file I didn't get information that seems >>> >? ? ?>? ? ?helpful to me. >>> >? ? ?> >>> >? ? ?>? ? ?Transferring just a couple of GB works without problems. >>> >? ? ?> >>> >? ? ?>? ? ?It may very well be that I'm already blind to the obvious >>> >? ? ?but after >>> >? ? ?>? ? ?many long running tests I can't find the crux in the setup. >>> >? ? ?> >>> >? ? ?>? ? ?Does anyone have an idea as how to approach this problem >>> >? ? ?in a way >>> >? ? ?>? ? ?that sheds some useful information? >>> >? ? ?> >>> >? ? ?>? ? ?Any help is highly appreciated! >>> >? ? ?>? ? ?Cheers >>> >? ? ?>? ? ?Richard >>> >? ? ?> >>> >? ? ?>? ? ?-- >>> >? ? ?>? ? ?/dev/null >>> >? ? ?> >>> >? ? ?> >>> >? ? ?> >>> >? ? ?> >>> >? ? ?>? ? ?_______________________________________________ >>> >? ? ?>? ? ?Gluster-users mailing list >>> >? ? ?>? ? ?Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> >? ? ?<mailto:Gluster-users at gluster.org >>> <mailto:Gluster-users at gluster.org>> >>> >? ? ?<mailto:Gluster-users at gluster.org >>> <mailto:Gluster-users at gluster.org> >>> >? ? ?<mailto:Gluster-users at gluster.org >>> <mailto:Gluster-users at gluster.org>>> >>> >? ? ?>? ? ?https://lists.gluster.org/mailman/listinfo/gluster-users >>> <https://lists.gluster.org/mailman/listinfo/gluster-users> >>> >? ? ?<https://lists.gluster.org/mailman/listinfo/gluster-users >>> <https://lists.gluster.org/mailman/listinfo/gluster-users>> >>> >? ? ?>? ? >>> ?<https://lists.gluster.org/mailman/listinfo/gluster-users >>> <https://lists.gluster.org/mailman/listinfo/gluster-users> >>> >? ? ?<https://lists.gluster.org/mailman/listinfo/gluster-users >>> <https://lists.gluster.org/mailman/listinfo/gluster-users>>> >>> >? ? ?> >>> >? ? ?> >>> > >>> > >>> >? ? ?-- >>> >? ? ?/dev/null >>> > >>> > >>> >>> >>> -- >>> /dev/null >>> >>> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180921/4f96ab50/attachment.sig>