thr3ads.net - Gluster users - [Gluster-users] single problematic node (brick) [May 2014]

If this information is useful, please help other people find it:
Share via:

Doug Schouten

2014-May-20 18:16 UTC

[Gluster-users] single problematic node (brick)

Hello,

I have a rather simple Gluster configuration that consists of 85TB
distributed across six nodes. There is one particular node that seems to
fail on a ~ weekly basis, and I can't figure out why.

I have attached my Gluster configuration and a recent log file from the
problematic node. For a user, when the failure occurs, the symptom is
that any attempts to access the Gluster volume from the problematic node
fails with "transport endpoint not connected" error.

Restarting the Gluster daemons and remounting the volume on the failed
node always fixes the problem. But usually by that point some number of
jobs in our batch queue have failed b/c of this issue already, and it's
becoming a headache.

It could be a fuse issue, since I see many related error messages in the
Gluster log, but I can't disentangle the various errors. The relevant
line in my /etc/fstab file is

server:global /global glusterfs
defaults,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log
0 0

Any ideas on the source of the problem? Could it be a hardware (network)
glitch? The fact that it only happens on one node that is identically
configured (with same hardware) as other nodes points to something like
that.

thanks! Doug
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster.log.gz
Type: application/gzip
Size: 19765 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140520/4eff4638/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster.cfg.gz
Type: application/gzip
Size: 429 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140520/4eff4638/attachment-0001.bin>

Franco Broi

2014-May-21 03:37 UTC

head link

[Gluster-users] single problematic node (brick)

Are you running out of memory? How much memory are the gluster daemons
using?

On Tue, 2014-05-20 at 11:16 -0700, Doug Schouten wrote: > Hello,
> 
> 	I have a rather simple Gluster configuration that consists of 85TB 
> distributed across six nodes. There is one particular node that seems to 
> fail on a ~ weekly basis, and I can't figure out why.
> 
> I have attached my Gluster configuration and a recent log file from the 
> problematic node. For a user, when the failure occurs, the symptom is 
> that any attempts to access the Gluster volume from the problematic node 
> fails with "transport endpoint not connected" error.
> 
> Restarting the Gluster daemons and remounting the volume on the failed 
> node always fixes the problem. But usually by that point some number of 
> jobs in our batch queue have failed b/c of this issue already, and it's
> becoming a headache.
> 
> It could be a fuse issue, since I see many related error messages in the 
> Gluster log, but I can't disentangle the various errors. The relevant 
> line in my /etc/fstab file is
> 
> server:global /global glusterfs 
>
defaults,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log
> 0 0
> 
> Any ideas on the source of the problem? Could it be a hardware (network) 
> glitch? The fact that it only happens on one node that is identically 
> configured (with same hardware) as other nodes points to something like 
> that.
> 
> thanks! Doug
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - May 2014 - single problematic node (brick)

[Gluster-users] single problematic node (brick)

[Gluster-users] single problematic node (brick)