Joe Julian
2015-Jan-26 21:23 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Mismatched GFIDs would happen if a file is created on multiple replicas during a split-brain event. The GFID is assigned at file creation. On 01/26/2015 01:04 PM, A Ghoshal wrote:> Yep, so it is indeed a split-brain caused by a mismatch of the trusted.gfid attribute. > > Sadly, I don't know precisely what causes it. -Communication loss might be one of the triggers. I am guessing the files with the problem are dynamic, correct? In our setup (also replica 2), communication is never a problem but we do see this when one of the server takes a reboot. Maybe some obscure and difficult to understand race between background self-heal and the self heal daemon... > > In any case, a normal procedure for split brain recovery would work for you if you wish to get you files back in function. It's easy to find on google. I use the instructions on Joe Julian's blog page myself. > > > -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- > > ======================> To: A Ghoshal <a.ghoshal at tcs.com> > From: Tiago Santos <tiago at musthavemenus.com> > Date: 01/27/2015 02:11AM > Cc: gluster-users <gluster-users at gluster.org> > Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while > ======================> Oh, right! > > Follow the outputs: > > > root at web3:/export/images1-1/brick# time getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > # file: templates/assets/prod/temporary/13/user_1339200.png > trusted.afr.site-images-client-0=0x000000000000000400000000 > trusted.afr.site-images-client-1=0x000000020000000900000000 > trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 > > real 0m0.024s > user 0m0.001s > sys 0m0.001s > > > > root at web4:/export/images2-1/brick# time getfattr -m . -d -e hex > templates/assets/prod/temporary/13/user_1339200.png > # file: templates/assets/prod/temporary/13/user_1339200.png > trusted.afr.site-images-client-0=0x000000000000000000000000 > trusted.afr.site-images-client-1=0x000000000000000000000000 > trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 > > real 0m0.003s > user 0m0.000s > sys 0m0.006s > > > Not sure exactly what that means. I'm googling, and would appreciate if you > guys can bring some light. > > Thanks! > -- > Tiago > > > > > On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal <a.ghoshal at tcs.com> wrote: > >> Actually you ran getfattr on the volume - which is why the requisite >> extended attributes never showed up... >> >> Your bricks are mounted elsewhere. >> /exports/images1-1/brick, and exports/images2-1/brick >> >> Btw, what version of Linux do you use? And, are the files you observe the >> input/output errors on soft-links? >> >> -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- >> >> ======================>> To: A Ghoshal <a.ghoshal at tcs.com> >> From: Tiago Santos <tiago at musthavemenus.com> >> Date: 01/27/2015 12:20AM >> Cc: gluster-users <gluster-users at gluster.org> >> Subject: Re: [Gluster-users] Pretty much any operation related to Gluster >> mounted fs hangs for a while >> ======================>> Thanks for you input, Anirban. >> >> I ran the commands on both servers, with the following results: >> >> >> root at web3:/var/www/site-images# time getfattr -m . -d -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> >> real 0m34.524s >> user 0m0.004s >> sys 0m0.000s >> >> >> root at web4:/var/www/site-images# time getfattr -m . -d -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output >> error >> >> real 0m11.315s >> user 0m0.001s >> sys 0m0.003s >> root at web4:/var/www/site-images# ls >> templates/assets/prod/temporary/13/user_1339200.png >> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: >> Input/output error >> >> > > =====-----=====-----====> Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Tiago Santos
2015-Jan-26 21:38 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Right. I have Brick1 being constantly written. But I have nothing writing on Brick2. It just get "healed" data from Brick1. This setup is still not in production, and there's no applications using that data. I have rsyncs constantly updating Brick1 (bring data from production servers), and then Gluster updates Brick2. Which makes me wonder how may I be creating multiple replicas during a split-brain. It may be the case that, having a split-brain event, I may be updating versions of the same file on Brick1 (only), and Gluster understands it as different versions and things get confuse? Anyways, while we talk I'm gonna run Joe's precious procedure on split-brain recovery. On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian <joe at julianfamily.org> wrote:> Mismatched GFIDs would happen if a file is created on multiple replicas > during a split-brain event. The GFID is assigned at file creation. > > > On 01/26/2015 01:04 PM, A Ghoshal wrote: > >> Yep, so it is indeed a split-brain caused by a mismatch of the >> trusted.gfid attribute. >> >> Sadly, I don't know precisely what causes it. -Communication loss might >> be one of the triggers. I am guessing the files with the problem are >> dynamic, correct? In our setup (also replica 2), communication is never a >> problem but we do see this when one of the server takes a reboot. Maybe >> some obscure and difficult to understand race between background self-heal >> and the self heal daemon... >> >> In any case, a normal procedure for split brain recovery would work for >> you if you wish to get you files back in function. It's easy to find on >> google. I use the instructions on Joe Julian's blog page myself. >> >> >> -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- >> >> ======================>> To: A Ghoshal <a.ghoshal at tcs.com> >> From: Tiago Santos <tiago at musthavemenus.com> >> Date: 01/27/2015 02:11AM >> Cc: gluster-users <gluster-users at gluster.org> >> Subject: Re: [Gluster-users] Pretty much any operation related to >> Gluster mounted fs hangs for a while >> ======================>> Oh, right! >> >> Follow the outputs: >> >> >> root at web3:/export/images1-1/brick# time getfattr -m . -d -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> # file: templates/assets/prod/temporary/13/user_1339200.png >> trusted.afr.site-images-client-0=0x000000000000000400000000 >> trusted.afr.site-images-client-1=0x000000020000000900000000 >> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 >> >> real 0m0.024s >> user 0m0.001s >> sys 0m0.001s >> >> >> >> root at web4:/export/images2-1/brick# time getfattr -m . -d -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> # file: templates/assets/prod/temporary/13/user_1339200.png >> trusted.afr.site-images-client-0=0x000000000000000000000000 >> trusted.afr.site-images-client-1=0x000000000000000000000000 >> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 >> >> real 0m0.003s >> user 0m0.000s >> sys 0m0.006s >> >> >> Not sure exactly what that means. I'm googling, and would appreciate if >> you >> guys can bring some light. >> >> Thanks! >> -- >> Tiago >> >> >> >> >> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal <a.ghoshal at tcs.com> wrote: >> >> Actually you ran getfattr on the volume - which is why the requisite >>> extended attributes never showed up... >>> >>> Your bricks are mounted elsewhere. >>> /exports/images1-1/brick, and exports/images2-1/brick >>> >>> Btw, what version of Linux do you use? And, are the files you observe the >>> input/output errors on soft-links? >>> >>> -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- >>> >>> ======================>>> To: A Ghoshal <a.ghoshal at tcs.com> >>> From: Tiago Santos <tiago at musthavemenus.com> >>> Date: 01/27/2015 12:20AM >>> Cc: gluster-users <gluster-users at gluster.org> >>> Subject: Re: [Gluster-users] Pretty much any operation related to >>> Gluster >>> mounted fs hangs for a while >>> ======================>>> Thanks for you input, Anirban. >>> >>> I ran the commands on both servers, with the following results: >>> >>> >>> root at web3:/var/www/site-images# time getfattr -m . -d -e hex >>> templates/assets/prod/temporary/13/user_1339200.png >>> >>> real 0m34.524s >>> user 0m0.004s >>> sys 0m0.000s >>> >>> >>> root at web4:/var/www/site-images# time getfattr -m . -d -e hex >>> templates/assets/prod/temporary/13/user_1339200.png >>> getfattr: templates/assets/prod/temporary/13/user_1339200.png: >>> Input/output >>> error >>> >>> real 0m11.315s >>> user 0m0.001s >>> sys 0m0.003s >>> root at web4:/var/www/site-images# ls >>> templates/assets/prod/temporary/13/user_1339200.png >>> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: >>> Input/output error >>> >>> >>> =====-----=====-----====>> Notice: The information contained in this e-mail >> message and/or attachments to it may contain >> confidential or privileged information. If you are >> not the intended recipient, any dissemination, use, >> review, distribution, printing or copying of the >> information contained in this e-mail message >> and/or attachments to it are strictly prohibited. If >> you have received this communication in error, >> please notify us by reply e-mail or telephone and >> immediately and permanently delete the message >> and any attachments. Thank you >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- *Tiago Santos* MustHaveMenus.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150126/44d80679/attachment.html>