thr3ads.net - Gluster users - [Gluster-users] Write operations failing on clients [May 2015]

If this information is useful, please help other people find it:
Share via:

Ben Turner

2015-Apr-30 21:10 UTC

[Gluster-users] Write operations failing on clients

Are your files split brained:

gluster v heal img info split-brain

I see alot of problem with your self heal daemon connecting:

[2015-04-29 16:15:37.137215] E [socket.c:2161:socket_connect_finish]
0-img-client-4: connection to 192.168.114.185:49154 failed (Connection refused)
[2015-04-29 16:15:37.434035] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-0: failed to get
the port number for remote subvolume. Please run 'gluster volume status'
on server to see if brick process is running.
[2015-04-29 16:15:40.308730] E [afr-self-heald.c:1479:afr_find_child_position]
0-img-replicate-2: getxattr failed on img-client-5 - (Transport endpoint is not
connected)
[2015-04-29 16:15:40.308878] E [afr-self-heald.c:1479:afr_find_child_position]
0-img-replicate-1: getxattr failed on img-client-3 - (Transport endpoint is not
connected)
[2015-04-29 16:15:41.192965] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-3: failed to get
the port number for remote subvolume. Please run 'gluster volume status'
on server to see if brick process is running.
[2015-04-29 16:20:23.184879] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:24007 failed (Connection refused)
[2015-04-29 16:21:01.684625] E
[client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-1: failed to get
the port number for remote subvolume. Please run 'gluster volume status'
on server to see if brick process is running.
[2015-04-29 16:24:14.211163] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:18.213126] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:22.212902] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:26.213708] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:30.214324] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)
[2015-04-29 16:24:34.214816] E [socket.c:2161:socket_connect_finish]
0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused)

There looks to have been some network flapping up and down and files may have
become split brianed.  Whenever you are bouncing services I usually:

$ service glusterd stop
$ killall glusterfs
$ killall glusterfsd
$ ps aux | grep glu  <- Make sure evertyhing is actually cleaned up

Anytime you take a node offline and back online make sure the files get resynced
with a self heal before you take offline any other nodes:

$ gluster v heal img full

If you do see split brained files you can resolve with:

http://blog.gluster.org/category/howtos/
https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/

LMK if you see any split brained files.

-b

----- Original Message -----> From: "Alex" <alex.m at icecat.biz>
> To: gluster-users at gluster.org
> Sent: Thursday, April 30, 2015 9:26:04 AM
> Subject: Re: [Gluster-users] Write operations failing on clients
> 
> Oh and this is output of some status commands:
> http://termbin.com/bvzz
> 
> Mount\umount worked just fine.
> 
> Alex
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Ben Turner

2015-Apr-30 21:21 UTC

head link

[Gluster-users] Write operations failing on clients

Also I see:

/var/log/glusterfs/img-rebalance.log-20150430
[2015-04-29 14:49:40.793369] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-img-dht: Fix layout failed for /www/thumbs
[2015-04-29 14:49:40.793625] E [dht-rebalance.c:1515:gf_defrag_fix_layout]
0-img-dht: Fix layout failed for /www

Have you recently run a rebalance?  Are you having trouble access those
directories?  It looks like the fix layout failed for those two.

-b


----- Original Message -----> From: "Ben Turner" <bturner at redhat.com>
> To: "Alex" <alex.m at icecat.biz>
> Cc: gluster-users at gluster.org
> Sent: Thursday, April 30, 2015 5:10:39 PM
> Subject: Re: [Gluster-users] Write operations failing on clients
> 
> Are your files split brained:
> 
> gluster v heal img info split-brain
> 
> I see alot of problem with your self heal daemon connecting:
> 
> [2015-04-29 16:15:37.137215] E [socket.c:2161:socket_connect_finish]
> 0-img-client-4: connection to 192.168.114.185:49154 failed (Connection
> refused)
> [2015-04-29 16:15:37.434035] E
> [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-0: failed
to
> get the port number for remote subvolume. Please run 'gluster volume
status'
> on server to see if brick process is running.
> [2015-04-29 16:15:40.308730] E
> [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-2: getxattr
> failed on img-client-5 - (Transport endpoint is not connected)
> [2015-04-29 16:15:40.308878] E
> [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-1: getxattr
> failed on img-client-3 - (Transport endpoint is not connected)
> [2015-04-29 16:15:41.192965] E
> [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-3: failed
to
> get the port number for remote subvolume. Please run 'gluster volume
status'
> on server to see if brick process is running.
> [2015-04-29 16:20:23.184879] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:24007 failed (Connection
> refused)
> [2015-04-29 16:21:01.684625] E
> [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-1: failed
to
> get the port number for remote subvolume. Please run 'gluster volume
status'
> on server to see if brick process is running.
> [2015-04-29 16:24:14.211163] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
> refused)
> [2015-04-29 16:24:18.213126] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
> refused)
> [2015-04-29 16:24:22.212902] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
> refused)
> [2015-04-29 16:24:26.213708] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
> refused)
> [2015-04-29 16:24:30.214324] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
> refused)
> [2015-04-29 16:24:34.214816] E [socket.c:2161:socket_connect_finish]
> 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection
> refused)
> 
> There looks to have been some network flapping up and down and files may
have
> become split brianed.  Whenever you are bouncing services I usually:
> 
> $ service glusterd stop
> $ killall glusterfs
> $ killall glusterfsd
> $ ps aux | grep glu  <- Make sure evertyhing is actually cleaned up
> 
> Anytime you take a node offline and back online make sure the files get
> resynced with a self heal before you take offline any other nodes:
> 
> $ gluster v heal img full
> 
> If you do see split brained files you can resolve with:
> 
> http://blog.gluster.org/category/howtos/
> https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
> 
> LMK if you see any split brained files.
> 
> -b
> 
> ----- Original Message -----
> > From: "Alex" <alex.m at icecat.biz>
> > To: gluster-users at gluster.org
> > Sent: Thursday, April 30, 2015 9:26:04 AM
> > Subject: Re: [Gluster-users] Write operations failing on clients
> > 
> > Oh and this is output of some status commands:
> > http://termbin.com/bvzz
> > 
> > Mount\umount worked just fine.
> > 
> > Alex
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Alex

2015-May-01 09:47 UTC

head link

[Gluster-users] Write operations failing on clients

> 
> Are your files split brained:
> 
> gluster v heal img info split-brain
> 
> I see alot of problem with your self heal daemon connecting:
As far as I can see nodes are not split brained:

# gluster v heal img info split-brain                               
Gathering list of split brain entries on volume img has been successful

Brick gluster1:/var/gl/images
Number of entries: 0

Brick gluster2:/var/gl/images
Number of entries: 0

Brick gluster3:/var/gl/images
Number of entries: 0

Brick gluster4:/var/gl/images
Number of entries: 0

Brick gluster5:/var/gl/images
Number of entries: 0

Brick gluster6:/var/gl/images
Number of entries: 0
> $ service glusterd stop
> $ killall glusterfs
> $ killall glusterfsd
> $ ps aux | grep glu  <- Make sure evertyhing is actually cleaned up
Yes, I actually did this in the first place with problematic nodes.
Unfortunately it did'nt help. CPU load came back in about 3-4 minutes.
> Have you recently run a rebalance?  
Rebalance was running when the problem occur and I stopped it to see if it
caused problems. I try to run it again.
> Are you having trouble access those directories?  It looks like the fixlayout failed for those two.

I can access those dirs via gluster-client:

# grep gluster /etc/fstab
gluster1:/img   /media       glusterfs       defaults,_netdev        0 1

# ls -la /media/www/ | wc -l
47

/www/thumbs have excessive amount of files so i just stat something inside:
# ls -l /media/www/thumbs/1000025.jpg
-rw-r--r-- 1 apache apache 4365 Oct  8  2009 /media/www/thumbs/1000025.jpg

Everything looks fine.

Thank you,
Alex

Gluster users - May 2015 - Write operations failing on clients

[Gluster-users] Write operations failing on clients

[Gluster-users] Write operations failing on clients

[Gluster-users] Write operations failing on clients