thr3ads.net - Gluster users - [Gluster-users] Transport endpoint is not connected : issue [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Johnson, Tim

2018-Aug-31 15:19 UTC

[Gluster-users] Transport endpoint is not connected : issue

Hello all,

      We have a gluster replicate (with arbiter)  volumes that we are getting
?Transport endpoint is not connected? with on a rotating basis  from each of the
two file servers, and a third host that has the arbiter bricks on.
This is happening when trying to run a heal on all the volumes on the gluster
hosts   When I get the status of all the volumes all looks good.
       This behavior seems to be a forshadowing of the gluster volumes becoming
unresponsive to our vm cluster.  As well as one of the file servers have two
processes for each of the volumes instead of one per volume. Eventually the
affected file server
will drop off the listed peers. Restarting glusterd/glusterfsd on the affected
file server does not take care of the issue, we have to bring down both file
Servers due to the volumes not being seen by the vm cluster after the errors
start occurring. I had seen that there were bug reports about the ?Transport
endpoint is not connected? on earlier versions of Gluster however had thought
that
It had been addressed.
     Dmesg did have some entries for ?a possible syn flood on port *? which we
changed the  sysctl to ?net.ipv4.tcp_max_syn_backlog = 2048? which seemed to
help the syn flood messages but not the underlying volume issues.
    I have put the versions of all the Gluster packages installed below as well
as the   ?Heal? and ?Status? commands showing the volumes are

       This has just started happening but cannot definitively say if this
started occurring after an update or not.


Thanks for any assistance.


Running Heal  :

gluster volume heal ovirt_engine info
Brick ****1.rrc.local:/bricks/brick0/ovirt_engine
Status: Connected
Number of entries: 0

Brick ****3.rrc.local:/bricks/brick0/ovirt_engine
Status: Transport endpoint is not connected
Number of entries: -

Brick *****3.rrc.local:/bricks/arb-brick/ovirt_engine
Status: Transport endpoint is not connected
Number of entries: -


Running status :

gluster volume status ovirt_engine
Status of volume: ovirt_engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick*****.rrc.local:/bricks/brick0/ov
irt_engine                                  49152     0          Y       5521
Brick fs2-tier3.rrc.local:/bricks/brick0/ov
irt_engine                                  49152     0          Y       6245
Brick ****.rrc.local:/bricks/arb-b
rick/ovirt_engine                           49152     0          Y       3526
Self-heal Daemon on localhost               N/A       N/A        Y       5509
Self-heal Daemon on ***.rrc.local     N/A       N/A        Y       6218
Self-heal Daemon on ***.rrc.local       N/A       N/A        Y       3501
Self-heal Daemon on ****.rrc.local N/A       N/A        Y       3657
Self-heal Daemon on *****.rrc.local   N/A       N/A        Y       3753
Self-heal Daemon on ****.rrc.local N/A       N/A        Y       17284

Task Status of Volume ovirt_engine
------------------------------------------------------------------------------
There are no active volume tasks




/etc/glusterd.vol.   :


volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 0
    option event-threads 1
    option rpc-auth-allow-insecure on
#   option transport.address-family inet6
#   option base-port 49152
end-volume





rpm -qa |grep gluster
glusterfs-3.12.13-1.el7.x86_64
glusterfs-gnfs-3.12.13-1.el7.x86_64
glusterfs-api-3.12.13-1.el7.x86_64
glusterfs-cli-3.12.13-1.el7.x86_64
glusterfs-client-xlators-3.12.13-1.el7.x86_64
glusterfs-fuse-3.12.13-1.el7.x86_64
centos-release-gluster312-1.0-2.el7.centos.noarch
glusterfs-rdma-3.12.13-1.el7.x86_64
glusterfs-libs-3.12.13-1.el7.x86_64
glusterfs-server-3.12.13-1.el7.x86_64
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180831/f2566f31/attachment.html>

Atin Mukherjee

2018-Aug-31 18:03 UTC

head link

[Gluster-users] Transport endpoint is not connected : issue

Can you please pass all the gluster log files from the server where the
transport end point not connected error is reported? As restarting glusterd
didn?t solve this issue, I believe this isn?t a stale port problem but
something else. Also please provide the output of ?gluster v info
<volname>?

(@cc Ravi, Karthik)

On Fri, 31 Aug 2018 at 23:24, Johnson, Tim <tjj at uic.edu> wrote:
> Hello all,
>
>
>
>       We have a gluster replicate (with arbiter)  volumes that we are
> getting ?Transport endpoint is not connected? with on a rotating basis
>  from each of the two file servers, and a third host that has the arbiter
> bricks on.
>
> This is happening when trying to run a heal on all the volumes on the
> gluster hosts   When I get the status of all the volumes all looks good.
>
>        This behavior seems to be a forshadowing of the gluster volumes
> becoming unresponsive to our vm cluster.  As well as one of the file
> servers have two processes for each of the volumes instead of one per
> volume. Eventually the affected file server
>
> will drop off the listed peers. Restarting glusterd/glusterfsd on the
> affected file server does not take care of the issue, we have to bring down
> both file
>
> Servers due to the volumes not being seen by the vm cluster after the
> errors start occurring. I had seen that there were bug reports about the
> ?Transport endpoint is not connected? on earlier versions of Gluster
> however had thought that
>
> It had been addressed.
>
>      Dmesg did have some entries for ?a possible syn flood on port *?
> which we changed the  sysctl to ?net.ipv4.tcp_max_syn_backlog = 2048? which
> seemed to help the syn flood messages but not the underlying volume issues.
>
>     I have put the versions of all the Gluster packages installed below as
> well as the   ?Heal? and ?Status? commands showing the volumes are
>
>
>
>        This has just started happening but cannot definitively say if this
> started occurring after an update or not.
>
>
>
>
>
> Thanks for any assistance.
>
>
>
>
>
> Running Heal  :
>
>
>
> gluster volume heal ovirt_engine info
>
> Brick ****1.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick ****3.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
> Brick *****3.rrc.local:/bricks/arb-brick/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
>
>
> Running status :
>
>
>
> gluster volume status ovirt_engine
>
> Status of volume: ovirt_engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
>
>
------------------------------------------------------------------------------
>
> Brick*****.rrc.local:/bricks/brick0/ov
>
> irt_engine                                  49152     0          Y
> 5521
>
> Brick fs2-tier3.rrc.local:/bricks/brick0/ov
>
> irt_engine                                  49152     0          Y
> 6245
>
> Brick ****.rrc.local:/bricks/arb-b
>
> rick/ovirt_engine                           49152     0          Y
> 3526
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5509
>
> Self-heal Daemon on ***.rrc.local     N/A       N/A        Y       6218
>
> Self-heal Daemon on ***.rrc.local       N/A       N/A        Y       3501
>
> Self-heal Daemon on ****.rrc.local N/A       N/A        Y       3657
>
> Self-heal Daemon on *****.rrc.local   N/A       N/A        Y       3753
>
> Self-heal Daemon on ****.rrc.local N/A       N/A        Y       17284
>
>
>
> Task Status of Volume ovirt_engine
>
>
>
------------------------------------------------------------------------------
>
> There are no active volume tasks
>
>
>
>
>
>
>
>
>
> /etc/glusterd.vol.   :
>
>
>
>
>
> volume management
>
>     type mgmt/glusterd
>
>     option working-directory /var/lib/glusterd
>
>     option transport-type socket,rdma
>
>     option transport.socket.keepalive-time 10
>
>     option transport.socket.keepalive-interval 2
>
>     option transport.socket.read-fail-log off
>
>     option ping-timeout 0
>
>     option event-threads 1
>
>     option rpc-auth-allow-insecure on
>
> #   option transport.address-family inet6
>
> #   option base-port 49152
>
> end-volume
>
>
>
>
>
>
>
>
>
>
>
> rpm -qa |grep gluster
>
> glusterfs-3.12.13-1.el7.x86_64
>
> glusterfs-gnfs-3.12.13-1.el7.x86_64
>
> glusterfs-api-3.12.13-1.el7.x86_64
>
> glusterfs-cli-3.12.13-1.el7.x86_64
>
> glusterfs-client-xlators-3.12.13-1.el7.x86_64
>
> glusterfs-fuse-3.12.13-1.el7.x86_64
>
> centos-release-gluster312-1.0-2.el7.centos.noarch
>
> glusterfs-rdma-3.12.13-1.el7.x86_64
>
> glusterfs-libs-3.12.13-1.el7.x86_64
>
> glusterfs-server-3.12.13-1.el7.x86_64
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-- 
- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180831/8d3299f0/attachment.html>

Gluster users - Aug 2018 - Transport endpoint is not connected : issue

[Gluster-users] Transport endpoint is not connected : issue

[Gluster-users] Transport endpoint is not connected : issue