A Ghoshal
2015-Jan-26 21:04 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Yep, so it is indeed a split-brain caused by a mismatch of the trusted.gfid attribute. Sadly, I don't know precisely what causes it. -Communication loss might be one of the triggers. I am guessing the files with the problem are dynamic, correct? In our setup (also replica 2), communication is never a problem but we do see this when one of the server takes a reboot. Maybe some obscure and difficult to understand race between background self-heal and the self heal daemon... In any case, a normal procedure for split brain recovery would work for you if you wish to get you files back in function. It's easy to find on google. I use the instructions on Joe Julian's blog page myself. -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- ====================== To: A Ghoshal <a.ghoshal at tcs.com> From: Tiago Santos <tiago at musthavemenus.com> Date: 01/27/2015 02:11AM Cc: gluster-users <gluster-users at gluster.org> Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while ====================== Oh, right! Follow the outputs: root at web3:/export/images1-1/brick# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png # file: templates/assets/prod/temporary/13/user_1339200.png trusted.afr.site-images-client-0=0x000000000000000400000000 trusted.afr.site-images-client-1=0x000000020000000900000000 trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 real 0m0.024s user 0m0.001s sys 0m0.001s root at web4:/export/images2-1/brick# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png # file: templates/assets/prod/temporary/13/user_1339200.png trusted.afr.site-images-client-0=0x000000000000000000000000 trusted.afr.site-images-client-1=0x000000000000000000000000 trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 real 0m0.003s user 0m0.000s sys 0m0.006s Not sure exactly what that means. I'm googling, and would appreciate if you guys can bring some light. Thanks! -- Tiago On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal <a.ghoshal at tcs.com> wrote:> > Actually you ran getfattr on the volume - which is why the requisite > extended attributes never showed up... > > Your bricks are mounted elsewhere. > /exports/images1-1/brick, and exports/images2-1/brick > > Btw, what version of Linux do you use? And, are the files you observe the > input/output errors on soft-links? > > -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- > > ======================> To: A Ghoshal <a.ghoshal at tcs.com> > From: Tiago Santos <tiago at musthavemenus.com> > Date: 01/27/2015 12:20AM > Cc: gluster-users <gluster-users at gluster.org> > Subject: Re: [Gluster-users] Pretty much any operation related to Gluster > mounted fs hangs for a while > ======================> Thanks for you input, Anirban. > > I ran the commands on both servers, with the following results: > > > root at web3:/var/www/site-images# time getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > > real 0m34.524s > user 0m0.004s > sys 0m0.000s > > > root at web4:/var/www/site-images# time getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output > error > > real 0m11.315s > user 0m0.001s > sys 0m0.003s > root at web4:/var/www/site-images# ls > templates/assets/prod/temporary/13/user_1339200.png > ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: > Input/output error > >=====-----=====-----====Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Joe Julian
2015-Jan-26 21:23 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Mismatched GFIDs would happen if a file is created on multiple replicas during a split-brain event. The GFID is assigned at file creation. On 01/26/2015 01:04 PM, A Ghoshal wrote:> Yep, so it is indeed a split-brain caused by a mismatch of the trusted.gfid attribute. > > Sadly, I don't know precisely what causes it. -Communication loss might be one of the triggers. I am guessing the files with the problem are dynamic, correct? In our setup (also replica 2), communication is never a problem but we do see this when one of the server takes a reboot. Maybe some obscure and difficult to understand race between background self-heal and the self heal daemon... > > In any case, a normal procedure for split brain recovery would work for you if you wish to get you files back in function. It's easy to find on google. I use the instructions on Joe Julian's blog page myself. > > > -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- > > ======================> To: A Ghoshal <a.ghoshal at tcs.com> > From: Tiago Santos <tiago at musthavemenus.com> > Date: 01/27/2015 02:11AM > Cc: gluster-users <gluster-users at gluster.org> > Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while > ======================> Oh, right! > > Follow the outputs: > > > root at web3:/export/images1-1/brick# time getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > # file: templates/assets/prod/temporary/13/user_1339200.png > trusted.afr.site-images-client-0=0x000000000000000400000000 > trusted.afr.site-images-client-1=0x000000020000000900000000 > trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 > > real 0m0.024s > user 0m0.001s > sys 0m0.001s > > > > root at web4:/export/images2-1/brick# time getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > # file: templates/assets/prod/temporary/13/user_1339200.png > trusted.afr.site-images-client-0=0x000000000000000000000000 > trusted.afr.site-images-client-1=0x000000000000000000000000 > trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 > > real 0m0.003s > user 0m0.000s > sys 0m0.006s > > > Not sure exactly what that means. I'm googling, and would appreciate if you > guys can bring some light. > > Thanks! > -- > Tiago > > > > > On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal <a.ghoshal at tcs.com> wrote: > >> Actually you ran getfattr on the volume - which is why the requisite >> extended attributes never showed up... >> >> Your bricks are mounted elsewhere. >> /exports/images1-1/brick, and exports/images2-1/brick >> >> Btw, what version of Linux do you use? And, are the files you observe the >> input/output errors on soft-links? >> >> -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- >> >> ======================>> To: A Ghoshal <a.ghoshal at tcs.com> >> From: Tiago Santos <tiago at musthavemenus.com> >> Date: 01/27/2015 12:20AM >> Cc: gluster-users <gluster-users at gluster.org> >> Subject: Re: [Gluster-users] Pretty much any operation related to Gluster >> mounted fs hangs for a while >> ======================>> Thanks for you input, Anirban. >> >> I ran the commands on both servers, with the following results: >> >> >> root at web3:/var/www/site-images# time getfattr -m . -d -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> >> real 0m34.524s >> user 0m0.004s >> sys 0m0.000s >> >> >> root at web4:/var/www/site-images# time getfattr -m . -d -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output >> error >> >> real 0m11.315s >> user 0m0.001s >> sys 0m0.003s >> root at web4:/var/www/site-images# ls >> templates/assets/prod/temporary/13/user_1339200.png >> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: >> Input/output error >> >> > > =====-----=====-----====> Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users