Tiago Santos
2015-Jan-26 18:50 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Thanks for you input, Anirban. I ran the commands on both servers, with the following results: root at web3:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png real 0m34.524s user 0m0.004s sys 0m0.000s root at web4:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output error real 0m11.315s user 0m0.001s sys 0m0.003s root at web4:/var/www/site-images# ls templates/assets/prod/temporary/13/user_1339200.png ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: Input/output error Not sure if it elucidate the issue.. Also, I saw at /var/log/gluster.log a zillion entries like these: [2015-01-26 17:35:39.973268] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9616964 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973435] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9594915 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973571] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9681971 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973686] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/19615 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973802] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/130392 (00000000-0000-0000-0000-000000000000) I have talked with some guys at #gluster that pointed it could be network issues. I'm still looking into it, but since the issue also happens locally (within the same server), would that still be a valid point? Also, less often, I see entries like these: [2015-01-26 17:41:25.956418] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png [2015-01-26 17:41:26.588753] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png Are those a definitive indication of a split-brain? Or just something usual until self-heal takes care of recently updated files? On Mon, Jan 26, 2015 at 2:25 PM, A Ghoshal <a.ghoshal at tcs.com> wrote:> I am plagued with something of this sort, too! > > What I mostly see when I explore these things is that > > A) it's a split-brain. > B) the split-brain is because the gfid's on the two replicas are at odds. > > You could check that out by > 1. On each server, first 'cd' to where your brick is mounted. > 2. getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > > You will see a trusted.gfid kind of extended attribute. If it's not the > same on both servers, there's a problem. > > Thanks, > Anirban > >Regards, -- *Tiago Santos* MustHaveMenus.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150126/ed243998/attachment.html>
A Ghoshal
2015-Jan-26 20:16 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Actually you ran getfattr on the volume - which is why the requisite extended attributes never showed up... Your bricks are mounted elsewhere. /exports/images1-1/brick, and exports/images2-1/brick Btw, what version of Linux do you use? And, are the files you observe the input/output errors on soft-links? -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- ====================== To: A Ghoshal <a.ghoshal at tcs.com> From: Tiago Santos <tiago at musthavemenus.com> Date: 01/27/2015 12:20AM Cc: gluster-users <gluster-users at gluster.org> Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while ====================== Thanks for you input, Anirban. I ran the commands on both servers, with the following results: root at web3:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png real 0m34.524s user 0m0.004s sys 0m0.000s root at web4:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output error real 0m11.315s user 0m0.001s sys 0m0.003s root at web4:/var/www/site-images# ls templates/assets/prod/temporary/13/user_1339200.png ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: Input/output error Not sure if it elucidate the issue.. Also, I saw at /var/log/gluster.log a zillion entries like these: [2015-01-26 17:35:39.973268] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9616964 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973435] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9594915 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973571] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9681971 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973686] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/19615 (00000000-0000-0000-0000-000000000000) [2015-01-26 17:35:39.973802] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/130392 (00000000-0000-0000-0000-000000000000) I have talked with some guys at #gluster that pointed it could be network issues. I'm still looking into it, but since the issue also happens locally (within the same server), would that still be a valid point? Also, less often, I see entries like these: [2015-01-26 17:41:25.956418] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png [2015-01-26 17:41:26.588753] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png Are those a definitive indication of a split-brain? Or just something usual until self-heal takes care of recently updated files? On Mon, Jan 26, 2015 at 2:25 PM, A Ghoshal <a.ghoshal at tcs.com> wrote:> I am plagued with something of this sort, too! > > What I mostly see when I explore these things is that > > A) it's a split-brain. > B) the split-brain is because the gfid's on the two replicas are at odds. > > You could check that out by > 1. On each server, first 'cd' to where your brick is mounted. > 2. getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > > You will see a trusted.gfid kind of extended attribute. If it's not the > same on both servers, there's a problem. > > Thanks, > Anirban > >Regards, -- *Tiago Santos* MustHaveMenus.com =====-----=====-----====Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you