thr3ads.net - Gluster users - [Gluster-users] gluster connection interrupted during transfer [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Richard Neuboeck

2018-Aug-29 12:41 UTC

[Gluster-users] gluster connection interrupted during transfer

Hi Gluster Community,

I have problems with a glusterfs 'Transport endpoint not connected'
connection abort during file transfers that I can replicate (all the
time now) but not pinpoint as to why this is happening.

The volume is set up in replica 3 mode and accessed with the fuse
gluster client. Both client and server are running CentOS and the
supplied 3.12.11 version of gluster.

The connection abort happens at different times during rsync but
occurs every time I try to sync all our files (1.1TB) to the empty
volume.

Client and server side I don't find errors in the gluster log files.
rsync logs the obvious transfer problem. The only log that shows
anything related is the server brick log which states that the
connection is shutting down:

[2018-08-18 22:40:35.502510] I [MSGID: 115036]
[server.c:527:server_rpc_notify] 0-home-server: disconnecting
connection from brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
[2018-08-18 22:40:35.502620] W
[inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
[2018-08-18 22:40:35.502692] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.502719] W
[entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
{client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
[2018-08-18 22:40:35.505950] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-home-server: Shutting down
connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0

Since I'm running another replica 3 setup for oVirt for a long time
now which is completely stable I thought I made a mistake setting
different options at first. However even when I reset those options
I'm able to reproduce the connection problem.

The unoptimized volume setup looks like this:

Volume Name: home
Type: Replicate
Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sphere-four:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-six:/srv/gluster_home/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 50%


The following additional options were used before:

performance.cache-size: 5GB
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-invalidation: on
network.inode-lru-limit: 50000
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
performance.parallel-readdir: on


In this case the gluster servers and also the client is using a
bonded network device running in adaptive load balancing mode.

I've tried using the debug option for the client mount. But except
for a ~0.5TB log file I didn't get information that seems helpful to me.

Transferring just a couple of GB works without problems.

It may very well be that I'm already blind to the obvious but after
many long running tests I can't find the crux in the setup.

Does anyone have an idea as how to approach this problem in a way
that sheds some useful information?

Any help is highly appreciated!
Cheers
Richard

-- 
/dev/null



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180829/31c941fe/attachment.sig>

Nithya Balachandran

2018-Aug-30 07:45 UTC

head link

[Gluster-users] gluster connection interrupted during transfer

Hi Richard,



On 29 August 2018 at 18:11, Richard Neuboeck <hawk at tbi.univie.ac.at>
wrote:
> Hi Gluster Community,
>
> I have problems with a glusterfs 'Transport endpoint not connected'
> connection abort during file transfers that I can replicate (all the
> time now) but not pinpoint as to why this is happening.
>
> The volume is set up in replica 3 mode and accessed with the fuse
> gluster client. Both client and server are running CentOS and the
> supplied 3.12.11 version of gluster.
>
> The connection abort happens at different times during rsync but
> occurs every time I try to sync all our files (1.1TB) to the empty
> volume.
>
> Client and server side I don't find errors in the gluster log files.
> rsync logs the obvious transfer problem. The only log that shows
> anything related is the server brick log which states that the
> connection is shutting down:
>
> [2018-08-18 22:40:35.502510] I [MSGID: 115036]
> [server.c:527:server_rpc_notify] 0-home-server: disconnecting
> connection from brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
> [2018-08-18 22:40:35.502620] W
> [inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
> on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
> {client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
> [2018-08-18 22:40:35.502692] W
> [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
> on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
> {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
> [2018-08-18 22:40:35.502719] W
> [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
> on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
> {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
> [2018-08-18 22:40:35.505950] I [MSGID: 101055]
> [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
> connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
> Since I'm running another replica 3 setup for oVirt for a long time
>
Is this setup running with the same gluster version and on the same nodes
or is it a different cluster?


> now which is completely stable I thought I made a mistake setting
> different options at first. However even when I reset those options
> I'm able to reproduce the connection problem.
>
> The unoptimized volume setup looks like this:
> Volume Name: home
> Type: Replicate
> Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: sphere-four:/srv/gluster_home/brick
> Brick2: sphere-five:/srv/gluster_home/brick
> Brick3: sphere-six:/srv/gluster_home/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 50%
>
>
> The following additional options were used before:
>
> performance.cache-size: 5GB
> client.event-threads: 4
> server.event-threads: 4
> cluster.lookup-optimize: on
> features.cache-invalidation: on
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> network.inode-lru-limit: 50000
> features.cache-invalidation-timeout: 600
> performance.md-cache-timeout: 600
> performance.parallel-readdir: on
>
>
> In this case the gluster servers and also the client is using a
> bonded network device running in adaptive load balancing mode.
>
> I've tried using the debug option for the client mount. But except
> for a ~0.5TB log file I didn't get information that seems helpful to
me.
>
> Transferring just a couple of GB works without problems.
>
> It may very well be that I'm already blind to the obvious but after
> many long running tests I can't find the crux in the setup.
>
> Does anyone have an idea as how to approach this problem in a way
> that sheds some useful information?
>
> Any help is highly appreciated!
> Cheers
> Richard
>
> --
> /dev/null
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180830/62ff18ed/attachment.html>

Raghavendra Gowdappa

2018-Aug-30 12:40 UTC

head link

[Gluster-users] gluster connection interrupted during transfer

Normally client logs will give a clue on why the disconnections are
happening (ping-timeout, wrong port etc). Can you look into client logs to
figure out what's happening? If you can't find anything, can you send
across client logs?

On Wed, Aug 29, 2018 at 6:11 PM, Richard Neuboeck <hawk at
tbi.univie.ac.at>
wrote:
> Hi Gluster Community,
>
> I have problems with a glusterfs 'Transport endpoint not connected'
> connection abort during file transfers that I can replicate (all the
> time now) but not pinpoint as to why this is happening.
>
> The volume is set up in replica 3 mode and accessed with the fuse
> gluster client. Both client and server are running CentOS and the
> supplied 3.12.11 version of gluster.
>
> The connection abort happens at different times during rsync but
> occurs every time I try to sync all our files (1.1TB) to the empty
> volume.
>
> Client and server side I don't find errors in the gluster log files.
> rsync logs the obvious transfer problem. The only log that shows
> anything related is the server brick log which states that the
> connection is shutting down:
>
> [2018-08-18 22:40:35.502510] I [MSGID: 115036]
> [server.c:527:server_rpc_notify] 0-home-server: disconnecting
> connection from brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
> [2018-08-18 22:40:35.502620] W
> [inodelk.c:499:pl_inodelk_log_cleanup] 0-home-server: releasing lock
> on eaeb0398-fefd-486d-84a7-f13744d1cf10 held by
> {client=0x7f83ec0b3ce0, pid=110423 lk-owner=d0fd5ffb427f0000}
> [2018-08-18 22:40:35.502692] W
> [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
> on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
> {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
> [2018-08-18 22:40:35.502719] W
> [entrylk.c:864:pl_entrylk_log_cleanup] 0-home-server: releasing lock
> on faa93f7b-6c46-4251-b2b2-abcd2f2613e1 held by
> {client=0x7f83ec0b3ce0, pid=110423 lk-owner=703dd4cc407f0000}
> [2018-08-18 22:40:35.505950] I [MSGID: 101055]
> [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
> connection brax-110405-2018/08/16-08:36:28:575972-home-client-0-0-0
>
> Since I'm running another replica 3 setup for oVirt for a long time
> now which is completely stable I thought I made a mistake setting
> different options at first. However even when I reset those options
> I'm able to reproduce the connection problem.
>
> The unoptimized volume setup looks like this:
>
> Volume Name: home
> Type: Replicate
> Volume ID: c92fa4cc-4a26-41ff-8c70-1dd07f733ac8
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: sphere-four:/srv/gluster_home/brick
> Brick2: sphere-five:/srv/gluster_home/brick
> Brick3: sphere-six:/srv/gluster_home/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 50%
>
>
> The following additional options were used before:
>
> performance.cache-size: 5GB
> client.event-threads: 4
> server.event-threads: 4
> cluster.lookup-optimize: on
> features.cache-invalidation: on
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> network.inode-lru-limit: 50000
> features.cache-invalidation-timeout: 600
> performance.md-cache-timeout: 600
> performance.parallel-readdir: on
>
>
> In this case the gluster servers and also the client is using a
> bonded network device running in adaptive load balancing mode.
>
> I've tried using the debug option for the client mount. But except
> for a ~0.5TB log file I didn't get information that seems helpful to
me.
>
> Transferring just a couple of GB works without problems.
>
> It may very well be that I'm already blind to the obvious but after
> many long running tests I can't find the crux in the setup.
>
> Does anyone have an idea as how to approach this problem in a way
> that sheds some useful information?
>
> Any help is highly appreciated!
> Cheers
> Richard
>
> --
> /dev/null
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180830/1179ed60/attachment.html>

Gluster users - Aug 2018 - gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer

[Gluster-users] gluster connection interrupted during transfer