Tiago Santos
2015-Jan-26 22:12 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
That's what I meant. Sorry for the confusion. I'm writing on Client1 (same server as Brick1). Client2 (mounted Brick2, on server2) has nothing writing to it (so far). My wondering is how I went up on having a split-brain if I'm only writing on one client. On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian <joe at julianfamily.org> wrote:> Nothing but GlusterFS should be writing to bricks. Mount a client and > write there. > > > On 01/26/2015 01:38 PM, Tiago Santos wrote: > > Right. > > I have Brick1 being constantly written. But I have nothing writing on > Brick2. It just get "healed" data from Brick1. > > This setup is still not in production, and there's no applications using > that data. I have rsyncs constantly updating Brick1 (bring data from > production servers), and then Gluster updates Brick2. > > Which makes me wonder how may I be creating multiple replicas during a > split-brain. > > > It may be the case that, having a split-brain event, I may be updating > versions of the same file on Brick1 (only), and Gluster understands it as > different versions and things get confuse? > > > Anyways, while we talk I'm gonna run Joe's precious procedure on > split-brain recovery. > > > > > > On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian <joe at julianfamily.org> wrote: > >> Mismatched GFIDs would happen if a file is created on multiple replicas >> during a split-brain event. The GFID is assigned at file creation. >> >> >> On 01/26/2015 01:04 PM, A Ghoshal wrote: >> >>> Yep, so it is indeed a split-brain caused by a mismatch of the >>> trusted.gfid attribute. >>> >>> Sadly, I don't know precisely what causes it. -Communication loss might >>> be one of the triggers. I am guessing the files with the problem are >>> dynamic, correct? In our setup (also replica 2), communication is never a >>> problem but we do see this when one of the server takes a reboot. Maybe >>> some obscure and difficult to understand race between background self-heal >>> and the self heal daemon... >>> >>> In any case, a normal procedure for split brain recovery would work for >>> you if you wish to get you files back in function. It's easy to find on >>> google. I use the instructions on Joe Julian's blog page myself. >>> >>> >>> -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- >>> >>> ======================>>> To: A Ghoshal <a.ghoshal at tcs.com> >>> From: Tiago Santos <tiago at musthavemenus.com> >>> Date: 01/27/2015 02:11AM >>> Cc: gluster-users <gluster-users at gluster.org> >>> Subject: Re: [Gluster-users] Pretty much any operation related to >>> Gluster mounted fs hangs for a while >>> ======================>>> Oh, right! >>> >>> Follow the outputs: >>> >>> >>> root at web3:/export/images1-1/brick# time getfattr -m . -d -e hex >>> templates/assets/prod/temporary/13/user_1339200.png >>> # file: templates/assets/prod/temporary/13/user_1339200.png >>> trusted.afr.site-images-client-0=0x000000000000000400000000 >>> trusted.afr.site-images-client-1=0x000000020000000900000000 >>> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 >>> >>> real 0m0.024s >>> user 0m0.001s >>> sys 0m0.001s >>> >>> >>> >>> root at web4:/export/images2-1/brick# time getfattr -m . -d -e hex >>> templates/assets/prod/temporary/13/user_1339200.png >>> # file: templates/assets/prod/temporary/13/user_1339200.png >>> trusted.afr.site-images-client-0=0x000000000000000000000000 >>> trusted.afr.site-images-client-1=0x000000000000000000000000 >>> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 >>> >>> real 0m0.003s >>> user 0m0.000s >>> sys 0m0.006s >>> >>> >>> Not sure exactly what that means. I'm googling, and would appreciate if >>> you >>> guys can bring some light. >>> >>> Thanks! >>> -- >>> Tiago >>> >>> >>> >>> >>> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal <a.ghoshal at tcs.com> wrote: >>> >>> Actually you ran getfattr on the volume - which is why the requisite >>>> extended attributes never showed up... >>>> >>>> Your bricks are mounted elsewhere. >>>> /exports/images1-1/brick, and exports/images2-1/brick >>>> >>>> Btw, what version of Linux do you use? And, are the files you observe >>>> the >>>> input/output errors on soft-links? >>>> >>>> -----Tiago Santos <tiago at musthavemenus.com> wrote: ----- >>>> >>>> ======================>>>> To: A Ghoshal <a.ghoshal at tcs.com> >>>> From: Tiago Santos <tiago at musthavemenus.com> >>>> Date: 01/27/2015 12:20AM >>>> Cc: gluster-users <gluster-users at gluster.org> >>>> Subject: Re: [Gluster-users] Pretty much any operation related to >>>> Gluster >>>> mounted fs hangs for a while >>>> ======================>>>> Thanks for you input, Anirban. >>>> >>>> I ran the commands on both servers, with the following results: >>>> >>>> >>>> root at web3:/var/www/site-images# time getfattr -m . -d -e hex >>>> templates/assets/prod/temporary/13/user_1339200.png >>>> >>>> real 0m34.524s >>>> user 0m0.004s >>>> sys 0m0.000s >>>> >>>> >>>> root at web4:/var/www/site-images# time getfattr -m . -d -e hex >>>> templates/assets/prod/temporary/13/user_1339200.png >>>> getfattr: templates/assets/prod/temporary/13/user_1339200.png: >>>> Input/output >>>> error >>>> >>>> real 0m11.315s >>>> user 0m0.001s >>>> sys 0m0.003s >>>> root at web4:/var/www/site-images# ls >>>> templates/assets/prod/temporary/13/user_1339200.png >>>> ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: >>>> Input/output error >>>> >>>> >>>> =====-----=====-----====>>> Notice: The information contained in this e-mail >>> message and/or attachments to it may contain >>> confidential or privileged information. If you are >>> not the intended recipient, any dissemination, use, >>> review, distribution, printing or copying of the >>> information contained in this e-mail message >>> and/or attachments to it are strictly prohibited. If >>> you have received this communication in error, >>> please notify us by reply e-mail or telephone and >>> immediately and permanently delete the message >>> and any attachments. Thank you >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > *Tiago Santos* > MustHaveMenus.com > > >-- *Tiago Santos* MustHaveMenus.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150126/4e4c1d75/attachment.html>
Joe Julian
2015-Jan-26 22:36 UTC
[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Check your client logs. Perhaps the client isn't actually connecting to both servers. On 01/26/2015 02:12 PM, Tiago Santos wrote:> That's what I meant. Sorry for the confusion. > > I'm writing on Client1 (same server as Brick1). Client2 (mounted > Brick2, on server2) has nothing writing to it (so far). > > My wondering is how I went up on having a split-brain if I'm only > writing on one client. > > > > > > On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian <joe at julianfamily.org > <mailto:joe at julianfamily.org>> wrote: > > Nothing but GlusterFS should be writing to bricks. Mount a client > and write there. > > > On 01/26/2015 01:38 PM, Tiago Santos wrote: >> Right. >> >> I have Brick1 being constantly written. But I have nothing >> writing on Brick2. It just get "healed" data from Brick1. >> >> This setup is still not in production, and there's no >> applications using that data. I have rsyncs constantly updating >> Brick1 (bring data from production servers), and then Gluster >> updates Brick2. >> >> Which makes me wonder how may I be creating multiple replicas >> during a split-brain. >> >> >> It may be the case that, having a split-brain event, I may be >> updating versions of the same file on Brick1 (only), and Gluster >> understands it as different versions and things get confuse? >> >> >> Anyways, while we talk I'm gonna run Joe's precious procedure on >> split-brain recovery. >> >> >> >> >> >> On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian <joe at julianfamily.org >> <mailto:joe at julianfamily.org>> wrote: >> >> Mismatched GFIDs would happen if a file is created on >> multiple replicas during a split-brain event. The GFID is >> assigned at file creation. >> >> >> On 01/26/2015 01:04 PM, A Ghoshal wrote: >> >> Yep, so it is indeed a split-brain caused by a mismatch >> of the trusted.gfid attribute. >> >> Sadly, I don't know precisely what causes it. >> -Communication loss might be one of the triggers. I am >> guessing the files with the problem are dynamic, correct? >> In our setup (also replica 2), communication is never a >> problem but we do see this when one of the server takes a >> reboot. Maybe some obscure and difficult to understand >> race between background self-heal and the self heal daemon... >> >> In any case, a normal procedure for split brain recovery >> would work for you if you wish to get you files back in >> function. It's easy to find on google. I use the >> instructions on Joe Julian's blog page myself. >> >> >> -----Tiago Santos <tiago at musthavemenus.com >> <mailto:tiago at musthavemenus.com>> wrote: ----- >> >> ======================>> To: A Ghoshal <a.ghoshal at tcs.com >> <mailto:a.ghoshal at tcs.com>> >> From: Tiago Santos <tiago at musthavemenus.com >> <mailto:tiago at musthavemenus.com>> >> Date: 01/27/2015 02:11AM >> Cc: gluster-users <gluster-users at gluster.org >> <mailto:gluster-users at gluster.org>> >> Subject: Re: [Gluster-users] Pretty much any operation >> related to Gluster mounted fs hangs for a while >> ======================>> Oh, right! >> >> Follow the outputs: >> >> >> root at web3:/export/images1-1/brick# time getfattr -m . -d >> -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> # file: templates/assets/prod/temporary/13/user_1339200.png >> trusted.afr.site-images-client-0=0x000000000000000400000000 >> trusted.afr.site-images-client-1=0x000000020000000900000000 >> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 >> >> real 0m0.024s >> user 0m0.001s >> sys 0m0.001s >> >> >> >> root at web4:/export/images2-1/brick# time getfattr -m . -d >> -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> # file: templates/assets/prod/temporary/13/user_1339200.png >> trusted.afr.site-images-client-0=0x000000000000000000000000 >> trusted.afr.site-images-client-1=0x000000000000000000000000 >> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 >> >> real 0m0.003s >> user 0m0.000s >> sys 0m0.006s >> >> >> Not sure exactly what that means. I'm googling, and would >> appreciate if you >> guys can bring some light. >> >> Thanks! >> -- >> Tiago >> >> >> >> >> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal >> <a.ghoshal at tcs.com <mailto:a.ghoshal at tcs.com>> wrote: >> >> Actually you ran getfattr on the volume - which is >> why the requisite >> extended attributes never showed up... >> >> Your bricks are mounted elsewhere. >> /exports/images1-1/brick, and exports/images2-1/brick >> >> Btw, what version of Linux do you use? And, are the >> files you observe the >> input/output errors on soft-links? >> >> -----Tiago Santos <tiago at musthavemenus.com >> <mailto:tiago at musthavemenus.com>> wrote: ----- >> >> ======================>> To: A Ghoshal <a.ghoshal at tcs.com >> <mailto:a.ghoshal at tcs.com>> >> From: Tiago Santos <tiago at musthavemenus.com >> <mailto:tiago at musthavemenus.com>> >> Date: 01/27/2015 12:20AM >> Cc: gluster-users <gluster-users at gluster.org >> <mailto:gluster-users at gluster.org>> >> Subject: Re: [Gluster-users] Pretty much any >> operation related to Gluster >> mounted fs hangs for a while >> ======================>> Thanks for you input, Anirban. >> >> I ran the commands on both servers, with the >> following results: >> >> >> root at web3:/var/www/site-images# time getfattr -m . -d >> -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> >> real 0m34.524s >> user 0m0.004s >> sys 0m0.000s >> >> >> root at web4:/var/www/site-images# time getfattr -m . -d >> -e hex >> templates/assets/prod/temporary/13/user_1339200.png >> getfattr: >> templates/assets/prod/temporary/13/user_1339200.png: >> Input/output >> error >> >> real 0m11.315s >> user 0m0.001s >> sys 0m0.003s >> root at web4:/var/www/site-images# ls >> templates/assets/prod/temporary/13/user_1339200.png >> ls: cannot access >> templates/assets/prod/temporary/13/user_1339200.png: >> Input/output error >> >> >> =====-----=====-----====>> Notice: The information contained in this e-mail >> message and/or attachments to it may contain >> confidential or privileged information. If you are >> not the intended recipient, any dissemination, use, >> review, distribution, printing or copying of the >> information contained in this e-mail message >> and/or attachments to it are strictly prohibited. If >> you have received this communication in error, >> please notify us by reply e-mail or telephone and >> immediately and permanently delete the message >> and any attachments. Thank you >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -- >> *Tiago Santos* >> MustHaveMenus.com > > > > > -- > *Tiago Santos* > MustHaveMenus.com-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150126/c22dd4e6/attachment.html>