Are your files split brained: gluster v heal img info split-brain I see alot of problem with your self heal daemon connecting: [2015-04-29 16:15:37.137215] E [socket.c:2161:socket_connect_finish] 0-img-client-4: connection to 192.168.114.185:49154 failed (Connection refused) [2015-04-29 16:15:37.434035] E [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-04-29 16:15:40.308730] E [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-2: getxattr failed on img-client-5 - (Transport endpoint is not connected) [2015-04-29 16:15:40.308878] E [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-1: getxattr failed on img-client-3 - (Transport endpoint is not connected) [2015-04-29 16:15:41.192965] E [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-3: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-04-29 16:20:23.184879] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:24007 failed (Connection refused) [2015-04-29 16:21:01.684625] E [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-04-29 16:24:14.211163] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused) [2015-04-29 16:24:18.213126] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused) [2015-04-29 16:24:22.212902] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused) [2015-04-29 16:24:26.213708] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused) [2015-04-29 16:24:30.214324] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused) [2015-04-29 16:24:34.214816] E [socket.c:2161:socket_connect_finish] 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection refused) There looks to have been some network flapping up and down and files may have become split brianed. Whenever you are bouncing services I usually: $ service glusterd stop $ killall glusterfs $ killall glusterfsd $ ps aux | grep glu <- Make sure evertyhing is actually cleaned up Anytime you take a node offline and back online make sure the files get resynced with a self heal before you take offline any other nodes: $ gluster v heal img full If you do see split brained files you can resolve with: http://blog.gluster.org/category/howtos/ https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ LMK if you see any split brained files. -b ----- Original Message -----> From: "Alex" <alex.m at icecat.biz> > To: gluster-users at gluster.org > Sent: Thursday, April 30, 2015 9:26:04 AM > Subject: Re: [Gluster-users] Write operations failing on clients > > Oh and this is output of some status commands: > http://termbin.com/bvzz > > Mount\umount worked just fine. > > Alex > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Also I see: /var/log/glusterfs/img-rebalance.log-20150430 [2015-04-29 14:49:40.793369] E [dht-rebalance.c:1515:gf_defrag_fix_layout] 0-img-dht: Fix layout failed for /www/thumbs [2015-04-29 14:49:40.793625] E [dht-rebalance.c:1515:gf_defrag_fix_layout] 0-img-dht: Fix layout failed for /www Have you recently run a rebalance? Are you having trouble access those directories? It looks like the fix layout failed for those two. -b ----- Original Message -----> From: "Ben Turner" <bturner at redhat.com> > To: "Alex" <alex.m at icecat.biz> > Cc: gluster-users at gluster.org > Sent: Thursday, April 30, 2015 5:10:39 PM > Subject: Re: [Gluster-users] Write operations failing on clients > > Are your files split brained: > > gluster v heal img info split-brain > > I see alot of problem with your self heal daemon connecting: > > [2015-04-29 16:15:37.137215] E [socket.c:2161:socket_connect_finish] > 0-img-client-4: connection to 192.168.114.185:49154 failed (Connection > refused) > [2015-04-29 16:15:37.434035] E > [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-0: failed to > get the port number for remote subvolume. Please run 'gluster volume status' > on server to see if brick process is running. > [2015-04-29 16:15:40.308730] E > [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-2: getxattr > failed on img-client-5 - (Transport endpoint is not connected) > [2015-04-29 16:15:40.308878] E > [afr-self-heald.c:1479:afr_find_child_position] 0-img-replicate-1: getxattr > failed on img-client-3 - (Transport endpoint is not connected) > [2015-04-29 16:15:41.192965] E > [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-3: failed to > get the port number for remote subvolume. Please run 'gluster volume status' > on server to see if brick process is running. > [2015-04-29 16:20:23.184879] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:24007 failed (Connection > refused) > [2015-04-29 16:21:01.684625] E > [client-handshake.c:1760:client_query_portmap_cbk] 0-img-client-1: failed to > get the port number for remote subvolume. Please run 'gluster volume status' > on server to see if brick process is running. > [2015-04-29 16:24:14.211163] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection > refused) > [2015-04-29 16:24:18.213126] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection > refused) > [2015-04-29 16:24:22.212902] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection > refused) > [2015-04-29 16:24:26.213708] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection > refused) > [2015-04-29 16:24:30.214324] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection > refused) > [2015-04-29 16:24:34.214816] E [socket.c:2161:socket_connect_finish] > 0-img-client-1: connection to 192.168.114.182:49152 failed (Connection > refused) > > There looks to have been some network flapping up and down and files may have > become split brianed. Whenever you are bouncing services I usually: > > $ service glusterd stop > $ killall glusterfs > $ killall glusterfsd > $ ps aux | grep glu <- Make sure evertyhing is actually cleaned up > > Anytime you take a node offline and back online make sure the files get > resynced with a self heal before you take offline any other nodes: > > $ gluster v heal img full > > If you do see split brained files you can resolve with: > > http://blog.gluster.org/category/howtos/ > https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ > > LMK if you see any split brained files. > > -b > > ----- Original Message ----- > > From: "Alex" <alex.m at icecat.biz> > > To: gluster-users at gluster.org > > Sent: Thursday, April 30, 2015 9:26:04 AM > > Subject: Re: [Gluster-users] Write operations failing on clients > > > > Oh and this is output of some status commands: > > http://termbin.com/bvzz > > > > Mount\umount worked just fine. > > > > Alex > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
> > Are your files split brained: > > gluster v heal img info split-brain > > I see alot of problem with your self heal daemon connecting:As far as I can see nodes are not split brained: # gluster v heal img info split-brain Gathering list of split brain entries on volume img has been successful Brick gluster1:/var/gl/images Number of entries: 0 Brick gluster2:/var/gl/images Number of entries: 0 Brick gluster3:/var/gl/images Number of entries: 0 Brick gluster4:/var/gl/images Number of entries: 0 Brick gluster5:/var/gl/images Number of entries: 0 Brick gluster6:/var/gl/images Number of entries: 0> $ service glusterd stop > $ killall glusterfs > $ killall glusterfsd > $ ps aux | grep glu <- Make sure evertyhing is actually cleaned upYes, I actually did this in the first place with problematic nodes. Unfortunately it did'nt help. CPU load came back in about 3-4 minutes.> Have you recently run a rebalance?Rebalance was running when the problem occur and I stopped it to see if it caused problems. I try to run it again.> Are you having trouble access those directories? It looks like the fixlayout failed for those two. I can access those dirs via gluster-client: # grep gluster /etc/fstab gluster1:/img /media glusterfs defaults,_netdev 0 1 # ls -la /media/www/ | wc -l 47 /www/thumbs have excessive amount of files so i just stat something inside: # ls -l /media/www/thumbs/1000025.jpg -rw-r--r-- 1 apache apache 4365 Oct 8 2009 /media/www/thumbs/1000025.jpg Everything looks fine. Thank you, Alex