thr3ads.net - Gluster users - [Gluster-users] gluster connection interrupted during transfer [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Raghavendra Gowdappa

2018-Aug-31 01:50 UTC

[Gluster-users] gluster connection interrupted during transfer

+Mohit. +Milind

@Mohit/Milind,

Can you check logs and see whether you can find anything relevant?

On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck <hawk at
tbi.univie.ac.at>
wrote:
> Hi,
>
> I'm attaching a shortened version since the whole is about 5.8GB of
> the client mount log. It includes the initial mount messages and the
> last two minutes of log entries.
>
> It ends very anticlimactic without an obvious error. Is there
> anything specific I should be looking for?
>
Normally I look logs around disconnect msgs to find out the reason. But as
you said, sometimes one can see just disconnect msgs without any reason.
That normally points to reason for disconnect in the network rather than a
Glusterfs initiated disconnect.

> Cheers
> Richard
>
> On 08/30/2018 02:40 PM, Raghavendra Gowdappa wrote:
> > Normally client logs will give a clue on why the disconnections are
> > happening (ping-timeout, wrong port etc). Can you look into client
> > logs to figure out what's happening? If you can't find
anything, can
> > you send across client logs?
> >
> > On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck
> > <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>> wrote:
> >
> >     Hi Gluster Community,
> >
> >     I have problems with a glusterfs 'Transport endpoint not
connected'
> >     connection abort during file transfers that I can replicate (all
the
> >     time now) but not pinpoint as to why this is happening.
> >
> >     The volume is set up in replica 3 mode and accessed with the fuse
> >     gluster client. Both client and server are running CentOS and the
> >     supplied 3.12.11 version of gluster.
> >
> >     The connection abort happens at different times during rsync but
> >     occurs every time I try to sync all our files (1.1TB) to the empty
> >     volume.
> >
> >     Client and server side I don't find errors in the gluster log
files.
> >     rsync logs the obvious transfer problem. The only log that shows
> >     anything related is the server brick log which states that the
> >     connection is shutting down:
> >
> >     [2018-08-18 22:40:35.502510] I [MSGID: 115036]
> >     [server.c:527:server_rpc_notify] 0-home-server: disconnecting
> >     connection from
> >     brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
> >     [2018-08-18 22:40:35.502620] W
> >     [inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing
lock
> >     on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
> >     {client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
> >     [2018-08-18 22:40:35.502692] W
> >     [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing
lock
> >     on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
> >     {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
> >     [2018-08-18 22:40:35.502719] W
> >     [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing
lock
> >     on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
> >     {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
> >     [2018-08-18 22:40:35.505950] I [MSGID: 101055]
> >     [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
> >     connection
brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
> >
> >     Since I'm running another replica 3 setup for oVirt for a long
time
> >     now which is completely stable I thought I made a mistake setting
> >     different options at first. However even when I reset those
options
> >     I'm able to reproduce the connection problem.
> >
> >     The unoptimized volume setup looks like this:
> >
> >     Volume Name: home
> >     Type: Replicate
> >     Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
> >     Status: Started
> >     Snapshot Count: 0
> >     Number of Bricks: 1 x 3 = 3
> >     Transport-type: tcp
> >     Bricks:
> >     Brick1: sphere-four:/srv/gluster_home/brick
> >     Brick2: sphere-five:/srv/gluster_home/brick
> >     Brick3: sphere-six:/srv/gluster_home/brick
> >     Options Reconfigured:
> >     nfs.disable: on
> >     transport.address-family: inet
> >     cluster.quorum-type: auto
> >     cluster.server-quorum-type: server
> >     cluster.server-quorum-ratio: 50%
> >
> >
> >     The following additional options were used before:
> >
> >     performance.cache-size: 5GB
> >     client.event-threads: 4
> >     server.event-threads: 4
> >     cluster.lookup-optimize: on
> >     features.cache-invalidation: on
> >     performance.stat-prefetch: on
> >     performance.cache-invalidation: on
> >     network.inode-lru-limit: 50000
> >     features.cache-invalidation-timeout: 600
> >     performance.md-cache-timeout: 600
> >     performance.parallel-readdir: on
> >
> >
> >     In this case the gluster servers and also the client is using a
> >     bonded network device running in adaptive load balancing mode.
> >
> >     I've tried using the debug option for the client mount. But
except
> >     for a ~0.5TB log file I didn't get information that seems
> >     helpful to me.
> >
> >     Transferring just a couple of GB works without problems.
> >
> >     It may very well be that I'm already blind to the obvious but
after
> >     many long running tests I can't find the crux in the setup.
> >
> >     Does anyone have an idea as how to approach this problem in a way
> >     that sheds some useful information?
> >
> >     Any help is highly appreciated!
> >     Cheers
> >     Richard
> >
> >     --
> >     /dev/null
> >
> >
> >
> >
> >     _______________________________________________
> >     Gluster-users mailing list
> >     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
> >     https://lists.gluster.org/mailman/listinfo/gluster-users
> >     <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
>
>
> --
> /dev/null
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180831/03c7cf7f/attachment.html>

Richard Neuboeck

2018-Aug-31 05:41 UTC

head link

[Gluster-users] gluster connection interrupted during transfer

On 08/31/2018 03:50 AM, Raghavendra Gowdappa wrote:> +Mohit. +Milind
> 
> @Mohit/Milind,
> 
> Can you check logs and see whether you can find anything relevant?
From glances at the system logs nothing out of the ordinary
occurred. However I'll start another rsync and take a closer look.
It will take a few days.
> 
> On Thu, Aug 30, 2018 at 7:04 PM, Richard Neuboeck
> <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>>
wrote:
> 
>     Hi,
> 
>     I'm attaching a shortened version since the whole is about 5.8GB of
>     the client mount log. It includes the initial mount messages and the
>     last two minutes of log entries.
> 
>     It ends very anticlimactic without an obvious error. Is there
>     anything specific I should be looking for?
> 
> 
> Normally I look logs around disconnect msgs to find out the reason.
> But as you said, sometimes one can see just disconnect msgs without
> any reason. That normally points to reason for disconnect in the
> network rather than a Glusterfs initiated disconnect.
The rsync source is serving our homes currently so there are NFS
connections 24/7. There don't seem to be any network related
interruptions - a co-worker would be here faster than I could check
the logs if the connection to home would be broken ;-)
The three gluster machines are due to this problem reduced to only
testing so there is nothing else running.

> 
>     Cheers
>     Richard
> 
>     On 08/30/2018 02:40 PM, Raghavendra Gowdappa wrote:
>     > Normally client logs will give a clue on why the disconnections
are
>     > happening (ping-timeout, wrong port etc). Can you look into client
>     > logs to figure out what's happening? If you can't find
anything, can
>     > you send across client logs?
>     > 
>     > On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck
>     > <hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>
>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at
tbi.univie.ac.at>>>
>     wrote:
>     >
>     >? ? ?Hi Gluster Community,
>     >
>     >? ? ?I have problems with a glusterfs 'Transport endpoint not
>     connected'
>     >? ? ?connection abort during file transfers that I can
>     replicate (all the
>     >? ? ?time now) but not pinpoint as to why this is happening.
>     >
>     >? ? ?The volume is set up in replica 3 mode and accessed with
>     the fuse
>     >? ? ?gluster client. Both client and server are running CentOS
>     and the
>     >? ? ?supplied 3.12.11 version of gluster.
>     >
>     >? ? ?The connection abort happens at different times during
>     rsync but
>     >? ? ?occurs every time I try to sync all our files (1.1TB) to
>     the empty
>     >? ? ?volume.
>     >
>     >? ? ?Client and server side I don't find errors in the gluster
>     log files.
>     >? ? ?rsync logs the obvious transfer problem. The only log that
>     shows
>     >? ? ?anything related is the server brick log which states that the
>     >? ? ?connection is shutting down:
>     >
>     >? ? ?[2018-08-18 22:40:35.502510] I [MSGID: 115036]
>     >? ? ?[server.c:527:server_rpc_notify] 0-home-server: disconnecting
>     >? ? ?connection from
>     >? ? ?brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>     >? ? ?[2018-08-18 22:40:35.502620] W
>     >? ? ?[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server:
>     releasing lock
>     >? ? ?on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
>     >? ? ?{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
>     >? ? ?[2018-08-18 22:40:35.502692] W
>     >? ? ?[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server:
>     releasing lock
>     >? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
>     >? ? ?{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
>     >? ? ?[2018-08-18 22:40:35.502719] W
>     >? ? ?[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server:
>     releasing lock
>     >? ? ?on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
>     >? ? ?{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
>     >? ? ?[2018-08-18 22:40:35.505950] I [MSGID: 101055]
>     >? ? ?[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
>     >? ? ?connection
>     brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>     >
>     >? ? ?Since I'm running another replica 3 setup for oVirt for a
>     long time
>     >? ? ?now which is completely stable I thought I made a mistake
>     setting
>     >? ? ?different options at first. However even when I reset
>     those options
>     >? ? ?I'm able to reproduce the connection problem.
>     >
>     >? ? ?The unoptimized volume setup looks like this:
>     >
>     >? ? ?Volume Name: home
>     >? ? ?Type: Replicate
>     >? ? ?Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
>     >? ? ?Status: Started
>     >? ? ?Snapshot Count: 0
>     >? ? ?Number of Bricks: 1 x 3 = 3
>     >? ? ?Transport-type: tcp
>     >? ? ?Bricks:
>     >? ? ?Brick1: sphere-four:/srv/gluster_home/brick
>     >? ? ?Brick2: sphere-five:/srv/gluster_home/brick
>     >? ? ?Brick3: sphere-six:/srv/gluster_home/brick
>     >? ? ?Options Reconfigured:
>     >? ? ?nfs.disable: on
>     >? ? ?transport.address-family: inet
>     >? ? ?cluster.quorum-type: auto
>     >? ? ?cluster.server-quorum-type: server
>     >? ? ?cluster.server-quorum-ratio: 50%
>     >
>     >
>     >? ? ?The following additional options were used before:
>     >
>     >? ? ?performance.cache-size: 5GB
>     >? ? ?client.event-threads: 4
>     >? ? ?server.event-threads: 4
>     >? ? ?cluster.lookup-optimize: on
>     >? ? ?features.cache-invalidation: on
>     >? ? ?performance.stat-prefetch: on
>     >? ? ?performance.cache-invalidation: on
>     >? ? ?network.inode-lru-limit: 50000
>     >? ? ?features.cache-invalidation-timeout: 600
>     >? ? ?performance.md-cache-timeout: 600
>     >? ? ?performance.parallel-readdir: on
>     >
>     >
>     >? ? ?In this case the gluster servers and also the client is
>     using a
>     >? ? ?bonded network device running in adaptive load balancing mode.
>     >
>     >? ? ?I've tried using the debug option for the client mount.
>     But except
>     >? ? ?for a ~0.5TB log file I didn't get information that seems
>     >? ? ?helpful to me.
>     >
>     >? ? ?Transferring just a couple of GB works without problems.
>     >
>     >? ? ?It may very well be that I'm already blind to the obvious
>     but after
>     >? ? ?many long running tests I can't find the crux in the
setup.
>     >
>     >? ? ?Does anyone have an idea as how to approach this problem
>     in a way
>     >? ? ?that sheds some useful information?
>     >
>     >? ? ?Any help is highly appreciated!
>     >? ? ?Cheers
>     >? ? ?Richard
>     >
>     >? ? ?--
>     >? ? ?/dev/null
>     >
>     >
>     >
>     >
>     >? ? ?_______________________________________________
>     >? ? ?Gluster-users mailing list
>     >? ? ?Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>
>     <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>
>     >? ? ?https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>     >? ? ?<https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>     >
>     >
> 
> 
>     -- 
>     /dev/null
> 
> 

-- 
/dev/null

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180831/eb7b0bae/attachment.sig>

Gluster users - Aug 2018 - gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer