thr3ads.net - Gluster users - [Gluster-users] gluster connection interrupted during transfer [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Richard Neuboeck

2018-Sep-13 08:07 UTC

[Gluster-users] gluster connection interrupted during transfer

Hi,

I've created excerpts from the brick and client logs +/- 1 minute to
the kill event. Still the logs are ~400-500MB so will put them
somewhere to download since I have no idea what I should be looking
for and skimming them didn't reveal obvious problems to me.

http://www.tbi.univie.ac.at/~hawk/gluster/brick_3min_excerpt.log
http://www.tbi.univie.ac.at/~hawk/gluster/mnt_3min_excerpt.log

I was pointed in the direction of the following Bugreport
https://bugzilla.redhat.com/show_bug.cgi?id=1613512
It sounds right but seems to have been addressed already.

If there is anything I can do to help solve this problem please let
me know. Thanks for your help!

Cheers
Richard


On 9/11/18 10:10 AM, Richard Neuboeck wrote:> Hi,
> 
> since I feared that the logs would fill up the partition (again) I
> checked the systems daily and finally found the reason. The glusterfs
> process on the client runs out of memory and get's killed by OOM after
> about four days. Since rsync runs for a couple of days longer till it
> ends I never checked the whole time frame in the system logs and never
> stumbled upon the OOM message.
> 
> Running out of memory on a 128GB RAM system even with a DB occupying
> ~40% of that is kind of strange though. Might there be a leak?
> 
> But this would explain the erratic behavior I've experienced over the
> last 1.5 years while trying to work with our homes on glusterfs.
> 
> Here is the kernel log message for the killed glusterfs process.
> https://gist.github.com/bleuchien/3d2b87985ecb944c60347d5e8660e36a
> 
> I'm checking the brick and client trace logs. But those are
respectively
> 1TB and 2TB in size so searching in them takes a while. I'll be
creating
> gists for both logs about the time when the process died.
> 
> As soon as I have more details I'll post them.
> 
> Here you can see a graphical representation of the memory usage of this
> system: https://imgur.com/a/4BINtfr
> 
> Cheers
> Richard
> 
> 
> 
> On 31.08.18 08:13, Raghavendra Gowdappa wrote:
>>
>>
>> On Fri, Aug 31, 2018 at 11:11 AM, Richard Neuboeck
>> <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>> wrote:
>>
>>     On 08/31/2018 03:50 AM, Raghavendra Gowdappa wrote:
>>     > +Mohit. +Milind
>>     > 
>>     > @Mohit/Milind,
>>     > 
>>     > Can you check logs and see whether you can find anything
relevant?
>>
>>     From glances at the system logs nothing out of the ordinary
>>     occurred. However I'll start another rsync and take a closer
look.
>>     It will take a few days.
>>
>>     > 
>>     > On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck
>>     > <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>> wrote:
>>     > 
>>     >? ? ?Hi,
>>     > 
>>     >? ? ?I'm attaching a shortened version since the whole is
about 5.8GB of
>>     >? ? ?the client mount log. It includes the initial mount
messages and the
>>     >? ? ?last two minutes of log entries.
>>     > 
>>     >? ? ?It ends very anticlimactic without an obvious error. Is
there
>>     >? ? ?anything specific I should be looking for?
>>     > 
>>     > 
>>     > Normally I look logs around disconnect msgs to find out the
reason.
>>     > But as you said, sometimes one can see just disconnect msgs
without
>>     > any reason. That normally points to reason for disconnect in
the
>>     > network rather than a Glusterfs initiated disconnect.
>>
>>     The rsync source is serving our homes currently so there are NFS
>>     connections 24/7. There don't seem to be any network related
>>     interruptions 
>>
>>
>> Can you set diagnostics.client-log-level and
diagnostics.brick-log-level
>> to TRACE and check logs of both ends of connections - client and brick?
>> To reduce the logsize, I would suggest to logrotate existing logs and
>> start with fresh logs when you are about to start so that only relevant
>> logs are captured. Also, can you take strace of client and brick
process
>> using:
>>
>> strace -o <outputfile> -ff -v -p <pid>
>>
>> attach both logs and strace. Let's trace through what syscalls on
socket
>> return and then decide whether to inspect tcpdump or not. If you
don't
>> want to repeat tests again, please capture tcpdump too (on both ends of
>> connection) and send them to us.
>>
>>
>>     - a co-worker would be here faster than I could check
>>     the logs if the connection to home would be broken ;-)
>>     The three gluster machines are due to this problem reduced to only
>>     testing so there is nothing else running.
>>
>>
>>     > 
>>     >? ? ?Cheers
>>     >? ? ?Richard
>>     > 
>>     >? ? ?On 08/30/2018 02:40 PM, Raghavendra Gowdappa wrote:
>>     >? ? ?> Normally client logs will give a clue on why the
disconnections are
>>     >? ? ?> happening (ping-timeout, wrong port etc). Can you
look into client
>>     >? ? ?> logs to figure out what's happening? If you
can't find anything, can
>>     >? ? ?> you send across client logs?
>>     >? ? ?> 
>>     >? ? ?> On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck
>>     >? ? ?> <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>
>>     >? ? ?<mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>>>
>>     >? ? ?wrote:
>>     >? ? ?>
>>     >? ? ?>? ? ?Hi Gluster Community,
>>     >? ? ?>
>>     >? ? ?>? ? ?I have problems with a glusterfs 'Transport
endpoint not
>>     >? ? ?connected'
>>     >? ? ?>? ? ?connection abort during file transfers that I can
>>     >? ? ?replicate (all the
>>     >? ? ?>? ? ?time now) but not pinpoint as to why this is
happening.
>>     >? ? ?>
>>     >? ? ?>? ? ?The volume is set up in replica 3 mode and
accessed with
>>     >? ? ?the fuse
>>     >? ? ?>? ? ?gluster client. Both client and server are
running CentOS
>>     >? ? ?and the
>>     >? ? ?>? ? ?supplied 3.12.11 version of gluster.
>>     >? ? ?>
>>     >? ? ?>? ? ?The connection abort happens at different times
during
>>     >? ? ?rsync but
>>     >? ? ?>? ? ?occurs every time I try to sync all our files
(1.1TB) to
>>     >? ? ?the empty
>>     >? ? ?>? ? ?volume.
>>     >? ? ?>
>>     >? ? ?>? ? ?Client and server side I don't find errors in
the gluster
>>     >? ? ?log files.
>>     >? ? ?>? ? ?rsync logs the obvious transfer problem. The only
log that
>>     >? ? ?shows
>>     >? ? ?>? ? ?anything related is the server brick log which
states
>>     that the
>>     >? ? ?>? ? ?connection is shutting down:
>>     >? ? ?>
>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502510] I [MSGID: 115036]
>>     >? ? ?>? ? ?[server.c:527:server_rpc_notify] 0-home-server:
>>     disconnecting
>>     >? ? ?>? ? ?connection from
>>     >? ? ?>? ?
?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502620] W
>>     >? ? ?>? ? ?[inodelk.c:499:pl_inodelk_log_cleanup]
0-home-server:
>>     >? ? ?releasing lock
>>     >? ? ?>? ? ?on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
>>     >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423
>>     lk-owner=d0fd5ffb427f0000}
>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502692] W
>>     >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup]
0-home-server:
>>     >? ? ?releasing lock
>>     >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
>>     >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423
>>     lk-owner=703dd4cc407f0000}
>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502719] W
>>     >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup]
0-home-server:
>>     >? ? ?releasing lock
>>     >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
>>     >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423
>>     lk-owner=703dd4cc407f0000}
>>     >? ? ?>? ? ?[2018-08-18 22:40:35.505950] I [MSGID: 101055]
>>     >? ? ?>? ? ?[client_t.c:443:gf_client_unref] 0-home-server:
Shutting
>>     down
>>     >? ? ?>? ? ?connection
>>     >? ? ?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>>     >? ? ?>
>>     >? ? ?>? ? ?Since I'm running another replica 3 setup for
oVirt for a
>>     >? ? ?long time
>>     >? ? ?>? ? ?now which is completely stable I thought I made a
mistake
>>     >? ? ?setting
>>     >? ? ?>? ? ?different options at first. However even when I
reset
>>     >? ? ?those options
>>     >? ? ?>? ? ?I'm able to reproduce the connection problem.
>>     >? ? ?>
>>     >? ? ?>? ? ?The unoptimized volume setup looks like this:
>>     >? ? ?>
>>     >? ? ?>? ? ?Volume Name: home
>>     >? ? ?>? ? ?Type: Replicate
>>     >? ? ?>? ? ?Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
>>     >? ? ?>? ? ?Status: Started
>>     >? ? ?>? ? ?Snapshot Count: 0
>>     >? ? ?>? ? ?Number of Bricks: 1 x 3 = 3
>>     >? ? ?>? ? ?Transport-type: tcp
>>     >? ? ?>? ? ?Bricks:
>>     >? ? ?>? ? ?Brick1: sphere-four:/srv/gluster_home/brick
>>     >? ? ?>? ? ?Brick2: sphere-five:/srv/gluster_home/brick
>>     >? ? ?>? ? ?Brick3: sphere-six:/srv/gluster_home/brick
>>     >? ? ?>? ? ?Options Reconfigured:
>>     >? ? ?>? ? ?nfs.disable: on
>>     >? ? ?>? ? ?transport.address-family: inet
>>     >? ? ?>? ? ?cluster.quorum-type: auto
>>     >? ? ?>? ? ?cluster.server-quorum-type: server
>>     >? ? ?>? ? ?cluster.server-quorum-ratio: 50%
>>     >? ? ?>
>>     >? ? ?>
>>     >? ? ?>? ? ?The following additional options were used
before:
>>     >? ? ?>
>>     >? ? ?>? ? ?performance.cache-size: 5GB
>>     >? ? ?>? ? ?client.event-threads: 4
>>     >? ? ?>? ? ?server.event-threads: 4
>>     >? ? ?>? ? ?cluster.lookup-optimize: on
>>     >? ? ?>? ? ?features.cache-invalidation: on
>>     >? ? ?>? ? ?performance.stat-prefetch: on
>>     >? ? ?>? ? ?performance.cache-invalidation: on
>>     >? ? ?>? ? ?network.inode-lru-limit: 50000
>>     >? ? ?>? ? ?features.cache-invalidation-timeout: 600
>>     >? ? ?>? ? ?performance.md-cache-timeout: 600
>>     >? ? ?>? ? ?performance.parallel-readdir: on
>>     >? ? ?>
>>     >? ? ?>
>>     >? ? ?>? ? ?In this case the gluster servers and also the
client is
>>     >? ? ?using a
>>     >? ? ?>? ? ?bonded network device running in adaptive load
balancing
>>     mode.
>>     >? ? ?>
>>     >? ? ?>? ? ?I've tried using the debug option for the
client mount.
>>     >? ? ?But except
>>     >? ? ?>? ? ?for a ~0.5TB log file I didn't get
information that seems
>>     >? ? ?>? ? ?helpful to me.
>>     >? ? ?>
>>     >? ? ?>? ? ?Transferring just a couple of GB works without
problems.
>>     >? ? ?>
>>     >? ? ?>? ? ?It may very well be that I'm already blind to
the obvious
>>     >? ? ?but after
>>     >? ? ?>? ? ?many long running tests I can't find the crux
in the setup.
>>     >? ? ?>
>>     >? ? ?>? ? ?Does anyone have an idea as how to approach this
problem
>>     >? ? ?in a way
>>     >? ? ?>? ? ?that sheds some useful information?
>>     >? ? ?>
>>     >? ? ?>? ? ?Any help is highly appreciated!
>>     >? ? ?>? ? ?Cheers
>>     >? ? ?>? ? ?Richard
>>     >? ? ?>
>>     >? ? ?>? ? ?--
>>     >? ? ?>? ? ?/dev/null
>>     >? ? ?>
>>     >? ? ?>
>>     >? ? ?>
>>     >? ? ?>
>>     >? ? ?>? ? ?_______________________________________________
>>     >? ? ?>? ? ?Gluster-users mailing list
>>     >? ? ?>? ? ?Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>     >? ? ?<mailto:Gluster-users at gluster.org
>>     <mailto:Gluster-users at gluster.org>>
>>     >? ? ?<mailto:Gluster-users at gluster.org
>>     <mailto:Gluster-users at gluster.org>
>>     >? ? ?<mailto:Gluster-users at gluster.org
>>     <mailto:Gluster-users at gluster.org>>>
>>     >? ? ?>? ?
?https://lists.gluster.org/mailman/listinfo/gluster-users
>>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>     >? ?
?<https://lists.gluster.org/mailman/listinfo/gluster-users
>>    
<https://lists.gluster.org/mailman/listinfo/gluster-users>>
>>     >? ? ?>? ?
>>     ?<https://lists.gluster.org/mailman/listinfo/gluster-users
>>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>     >? ?
?<https://lists.gluster.org/mailman/listinfo/gluster-users
>>    
<https://lists.gluster.org/mailman/listinfo/gluster-users>>>
>>     >? ? ?>
>>     >? ? ?>
>>     >
>>     >
>>     >? ? ?--
>>     >? ? ?/dev/null
>>     >
>>     >
>>
>>
>>     -- 
>>     /dev/null
>>
>>
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 

-- 
/dev/null

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180913/bc58eaf9/attachment.sig>

Richard Neuboeck

2018-Sep-21 07:14 UTC

head link

[Gluster-users] gluster connection interrupted during transfer

Hi again,

in my limited - non full time programmer - understanding it's a memory
leak in the gluster fuse client.

Should I reopen the mentioned bugreport or open a new one? Or would the
community prefer an entirely different approach?

Thanks
Richard

On 13.09.18 10:07, Richard Neuboeck wrote:> Hi,
> 
> I've created excerpts from the brick and client logs +/- 1 minute to
> the kill event. Still the logs are ~400-500MB so will put them
> somewhere to download since I have no idea what I should be looking
> for and skimming them didn't reveal obvious problems to me.
> 
> http://www.tbi.univie.ac.at/~hawk/gluster/brick_3min_excerpt.log
> http://www.tbi.univie.ac.at/~hawk/gluster/mnt_3min_excerpt.log
> 
> I was pointed in the direction of the following Bugreport
> https://bugzilla.redhat.com/show_bug.cgi?id=1613512
> It sounds right but seems to have been addressed already.
> 
> If there is anything I can do to help solve this problem please let
> me know. Thanks for your help!
> 
> Cheers
> Richard
> 
> 
> On 9/11/18 10:10 AM, Richard Neuboeck wrote:
>> Hi,
>>
>> since I feared that the logs would fill up the partition (again) I
>> checked the systems daily and finally found the reason. The glusterfs
>> process on the client runs out of memory and get's killed by OOM
after
>> about four days. Since rsync runs for a couple of days longer till it
>> ends I never checked the whole time frame in the system logs and never
>> stumbled upon the OOM message.
>>
>> Running out of memory on a 128GB RAM system even with a DB occupying
>> ~40% of that is kind of strange though. Might there be a leak?
>>
>> But this would explain the erratic behavior I've experienced over
the
>> last 1.5 years while trying to work with our homes on glusterfs.
>>
>> Here is the kernel log message for the killed glusterfs process.
>> https://gist.github.com/bleuchien/3d2b87985ecb944c60347d5e8660e36a
>>
>> I'm checking the brick and client trace logs. But those are
respectively
>> 1TB and 2TB in size so searching in them takes a while. I'll be
creating
>> gists for both logs about the time when the process died.
>>
>> As soon as I have more details I'll post them.
>>
>> Here you can see a graphical representation of the memory usage of this
>> system: https://imgur.com/a/4BINtfr
>>
>> Cheers
>> Richard
>>
>>
>>
>> On 31.08.18 08:13, Raghavendra Gowdappa wrote:
>>>
>>>
>>> On Fri, Aug 31, 2018 at 11:11 AM, Richard Neuboeck
>>> <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>> wrote:
>>>
>>>     On 08/31/2018 03:50 AM, Raghavendra Gowdappa wrote:
>>>     > +Mohit. +Milind
>>>     > 
>>>     > @Mohit/Milind,
>>>     > 
>>>     > Can you check logs and see whether you can find anything
relevant?
>>>
>>>     From glances at the system logs nothing out of the ordinary
>>>     occurred. However I'll start another rsync and take a
closer look.
>>>     It will take a few days.
>>>
>>>     > 
>>>     > On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck
>>>     > <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>>>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>> wrote:
>>>     > 
>>>     >? ? ?Hi,
>>>     > 
>>>     >? ? ?I'm attaching a shortened version since the whole
is about 5.8GB of
>>>     >? ? ?the client mount log. It includes the initial mount
messages and the
>>>     >? ? ?last two minutes of log entries.
>>>     > 
>>>     >? ? ?It ends very anticlimactic without an obvious error.
Is there
>>>     >? ? ?anything specific I should be looking for?
>>>     > 
>>>     > 
>>>     > Normally I look logs around disconnect msgs to find out
the reason.
>>>     > But as you said, sometimes one can see just disconnect
msgs without
>>>     > any reason. That normally points to reason for disconnect
in the
>>>     > network rather than a Glusterfs initiated disconnect.
>>>
>>>     The rsync source is serving our homes currently so there are
NFS
>>>     connections 24/7. There don't seem to be any network
related
>>>     interruptions 
>>>
>>>
>>> Can you set diagnostics.client-log-level and
diagnostics.brick-log-level
>>> to TRACE and check logs of both ends of connections - client and
brick?
>>> To reduce the logsize, I would suggest to logrotate existing logs
and
>>> start with fresh logs when you are about to start so that only
relevant
>>> logs are captured. Also, can you take strace of client and brick
process
>>> using:
>>>
>>> strace -o <outputfile> -ff -v -p <pid>
>>>
>>> attach both logs and strace. Let's trace through what syscalls
on socket
>>> return and then decide whether to inspect tcpdump or not. If you
don't
>>> want to repeat tests again, please capture tcpdump too (on both
ends of
>>> connection) and send them to us.
>>>
>>>
>>>     - a co-worker would be here faster than I could check
>>>     the logs if the connection to home would be broken ;-)
>>>     The three gluster machines are due to this problem reduced to
only
>>>     testing so there is nothing else running.
>>>
>>>
>>>     > 
>>>     >? ? ?Cheers
>>>     >? ? ?Richard
>>>     > 
>>>     >? ? ?On 08/30/2018 02:40 PM, Raghavendra Gowdappa wrote:
>>>     >? ? ?> Normally client logs will give a clue on why the
disconnections are
>>>     >? ? ?> happening (ping-timeout, wrong port etc). Can you
look into client
>>>     >? ? ?> logs to figure out what's happening? If you
can't find anything, can
>>>     >? ? ?> you send across client logs?
>>>     >? ? ?> 
>>>     >? ? ?> On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck
>>>     >? ? ?> <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>>>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>
>>>     >? ? ?<mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>>>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>>>
>>>     >? ? ?wrote:
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Hi Gluster Community,
>>>     >? ? ?>
>>>     >? ? ?>? ? ?I have problems with a glusterfs
'Transport endpoint not
>>>     >? ? ?connected'
>>>     >? ? ?>? ? ?connection abort during file transfers that I
can
>>>     >? ? ?replicate (all the
>>>     >? ? ?>? ? ?time now) but not pinpoint as to why this is
happening.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?The volume is set up in replica 3 mode and
accessed with
>>>     >? ? ?the fuse
>>>     >? ? ?>? ? ?gluster client. Both client and server are
running CentOS
>>>     >? ? ?and the
>>>     >? ? ?>? ? ?supplied 3.12.11 version of gluster.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?The connection abort happens at different
times during
>>>     >? ? ?rsync but
>>>     >? ? ?>? ? ?occurs every time I try to sync all our files
(1.1TB) to
>>>     >? ? ?the empty
>>>     >? ? ?>? ? ?volume.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Client and server side I don't find
errors in the gluster
>>>     >? ? ?log files.
>>>     >? ? ?>? ? ?rsync logs the obvious transfer problem. The
only log that
>>>     >? ? ?shows
>>>     >? ? ?>? ? ?anything related is the server brick log
which states
>>>     that the
>>>     >? ? ?>? ? ?connection is shutting down:
>>>     >? ? ?>
>>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502510] I [MSGID:
115036]
>>>     >? ? ?>? ? ?[server.c:527:server_rpc_notify]
0-home-server:
>>>     disconnecting
>>>     >? ? ?>? ? ?connection from
>>>     >? ? ?>? ?
?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502620] W
>>>     >? ? ?>? ? ?[inodelk.c:499:pl_inodelk_log_cleanup]
0-home-server:
>>>     >? ? ?releasing lock
>>>     >? ? ?>? ? ?on eaeb0398-fefd-486d-84a7-f13744d1cf10 held
by
>>>     >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423
>>>     lk-owner=d0fd5ffb427f0000}
>>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502692] W
>>>     >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup]
0-home-server:
>>>     >? ? ?releasing lock
>>>     >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held
by
>>>     >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423
>>>     lk-owner=703dd4cc407f0000}
>>>     >? ? ?>? ? ?[2018-08-18 22:40:35.502719] W
>>>     >? ? ?>? ? ?[entrylk.c:864:pl_entrylk_log_cleanup]
0-home-server:
>>>     >? ? ?releasing lock
>>>     >? ? ?>? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held
by
>>>     >? ? ?>? ? ?{client=0x7f83ec0b3ce0, pid=110423
>>>     lk-owner=703dd4cc407f0000}
>>>     >? ? ?>? ? ?[2018-08-18 22:40:35.505950] I [MSGID:
101055]
>>>     >? ? ?>? ? ?[client_t.c:443:gf_client_unref]
0-home-server: Shutting
>>>     down
>>>     >? ? ?>? ? ?connection
>>>     >? ?
?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Since I'm running another replica 3 setup
for oVirt for a
>>>     >? ? ?long time
>>>     >? ? ?>? ? ?now which is completely stable I thought I
made a mistake
>>>     >? ? ?setting
>>>     >? ? ?>? ? ?different options at first. However even when
I reset
>>>     >? ? ?those options
>>>     >? ? ?>? ? ?I'm able to reproduce the connection
problem.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?The unoptimized volume setup looks like this:
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Volume Name: home
>>>     >? ? ?>? ? ?Type: Replicate
>>>     >? ? ?>? ? ?Volume ID:
c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
>>>     >? ? ?>? ? ?Status: Started
>>>     >? ? ?>? ? ?Snapshot Count: 0
>>>     >? ? ?>? ? ?Number of Bricks: 1 x 3 = 3
>>>     >? ? ?>? ? ?Transport-type: tcp
>>>     >? ? ?>? ? ?Bricks:
>>>     >? ? ?>? ? ?Brick1: sphere-four:/srv/gluster_home/brick
>>>     >? ? ?>? ? ?Brick2: sphere-five:/srv/gluster_home/brick
>>>     >? ? ?>? ? ?Brick3: sphere-six:/srv/gluster_home/brick
>>>     >? ? ?>? ? ?Options Reconfigured:
>>>     >? ? ?>? ? ?nfs.disable: on
>>>     >? ? ?>? ? ?transport.address-family: inet
>>>     >? ? ?>? ? ?cluster.quorum-type: auto
>>>     >? ? ?>? ? ?cluster.server-quorum-type: server
>>>     >? ? ?>? ? ?cluster.server-quorum-ratio: 50%
>>>     >? ? ?>
>>>     >? ? ?>
>>>     >? ? ?>? ? ?The following additional options were used
before:
>>>     >? ? ?>
>>>     >? ? ?>? ? ?performance.cache-size: 5GB
>>>     >? ? ?>? ? ?client.event-threads: 4
>>>     >? ? ?>? ? ?server.event-threads: 4
>>>     >? ? ?>? ? ?cluster.lookup-optimize: on
>>>     >? ? ?>? ? ?features.cache-invalidation: on
>>>     >? ? ?>? ? ?performance.stat-prefetch: on
>>>     >? ? ?>? ? ?performance.cache-invalidation: on
>>>     >? ? ?>? ? ?network.inode-lru-limit: 50000
>>>     >? ? ?>? ? ?features.cache-invalidation-timeout: 600
>>>     >? ? ?>? ? ?performance.md-cache-timeout: 600
>>>     >? ? ?>? ? ?performance.parallel-readdir: on
>>>     >? ? ?>
>>>     >? ? ?>
>>>     >? ? ?>? ? ?In this case the gluster servers and also the
client is
>>>     >? ? ?using a
>>>     >? ? ?>? ? ?bonded network device running in adaptive
load balancing
>>>     mode.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?I've tried using the debug option for the
client mount.
>>>     >? ? ?But except
>>>     >? ? ?>? ? ?for a ~0.5TB log file I didn't get
information that seems
>>>     >? ? ?>? ? ?helpful to me.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Transferring just a couple of GB works
without problems.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?It may very well be that I'm already
blind to the obvious
>>>     >? ? ?but after
>>>     >? ? ?>? ? ?many long running tests I can't find the
crux in the setup.
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Does anyone have an idea as how to approach
this problem
>>>     >? ? ?in a way
>>>     >? ? ?>? ? ?that sheds some useful information?
>>>     >? ? ?>
>>>     >? ? ?>? ? ?Any help is highly appreciated!
>>>     >? ? ?>? ? ?Cheers
>>>     >? ? ?>? ? ?Richard
>>>     >? ? ?>
>>>     >? ? ?>? ? ?--
>>>     >? ? ?>? ? ?/dev/null
>>>     >? ? ?>
>>>     >? ? ?>
>>>     >? ? ?>
>>>     >? ? ?>
>>>     >? ? ?>? ?
?_______________________________________________
>>>     >? ? ?>? ? ?Gluster-users mailing list
>>>     >? ? ?>? ? ?Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>     >? ? ?<mailto:Gluster-users at gluster.org
>>>     <mailto:Gluster-users at gluster.org>>
>>>     >? ? ?<mailto:Gluster-users at gluster.org
>>>     <mailto:Gluster-users at gluster.org>
>>>     >? ? ?<mailto:Gluster-users at gluster.org
>>>     <mailto:Gluster-users at gluster.org>>>
>>>     >? ? ?>? ?
?https://lists.gluster.org/mailman/listinfo/gluster-users
>>>    
<https://lists.gluster.org/mailman/listinfo/gluster-users>
>>>     >? ?
?<https://lists.gluster.org/mailman/listinfo/gluster-users
>>>    
<https://lists.gluster.org/mailman/listinfo/gluster-users>>
>>>     >? ? ?>? ?
>>>     ?<https://lists.gluster.org/mailman/listinfo/gluster-users
>>>    
<https://lists.gluster.org/mailman/listinfo/gluster-users>
>>>     >? ?
?<https://lists.gluster.org/mailman/listinfo/gluster-users
>>>    
<https://lists.gluster.org/mailman/listinfo/gluster-users>>>
>>>     >? ? ?>
>>>     >? ? ?>
>>>     >
>>>     >
>>>     >? ? ?--
>>>     >? ? ?/dev/null
>>>     >
>>>     >
>>>
>>>
>>>     -- 
>>>     /dev/null
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180921/4f96ab50/attachment.sig>

Gluster users - Sep 2018 - gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer