thr3ads.net - Gluster users - [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Tiago Santos

2015-Jan-26 22:12 UTC

[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

That's what I meant. Sorry for the confusion.

I'm writing on Client1 (same server as Brick1). Client2 (mounted Brick2, on
server2) has nothing writing to it (so far).

My wondering is how I went up on having a split-brain if I'm only writing
on one client.





On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian <joe at julianfamily.org>
wrote:
>  Nothing but GlusterFS should be writing to bricks. Mount a client and
> write there.
>
>
> On 01/26/2015 01:38 PM, Tiago Santos wrote:
>
> Right.
>
>  I have Brick1 being constantly written. But I have nothing writing on
> Brick2. It just get "healed" data from Brick1.
>
>  This setup is still not in production, and there's no applications
using
> that data. I have rsyncs constantly updating Brick1 (bring data from
> production servers), and then Gluster updates Brick2.
>
>  Which makes me wonder how may I be creating multiple replicas during a
> split-brain.
>
>
>  It may be the case that, having a split-brain event, I may be updating
> versions of the same file on Brick1 (only), and Gluster understands it as
> different versions and things get confuse?
>
>
>  Anyways, while we talk I'm gonna run Joe's precious procedure on
> split-brain recovery.
>
>
>
>
>
> On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian <joe at julianfamily.org>
wrote:
>
>> Mismatched GFIDs would happen if a file is created on multiple replicas
>> during a split-brain event. The GFID is assigned at file creation.
>>
>>
>> On 01/26/2015 01:04 PM, A Ghoshal wrote:
>>
>>>  Yep, so it is indeed a split-brain caused by a mismatch of the
>>> trusted.gfid attribute.
>>>
>>> Sadly, I don't know precisely what causes it. -Communication
loss might
>>> be one of the triggers. I am guessing the files with the problem
are
>>> dynamic, correct? In our setup (also replica 2), communication is
never a
>>> problem but we do see this when one of the server takes a reboot.
Maybe
>>> some obscure and difficult to understand race between background
self-heal
>>> and the self heal daemon...
>>>
>>> In any case, a normal procedure for split brain recovery would work
for
>>> you if you wish to get you files back in function. It's easy to
find on
>>> google. I use the instructions on Joe Julian's blog page
myself.
>>>
>>>
>>>   -----Tiago Santos <tiago at musthavemenus.com> wrote: -----
>>>
>>>   ======================>>>   To: A Ghoshal <a.ghoshal
at tcs.com>
>>>   From: Tiago Santos <tiago at musthavemenus.com>
>>>   Date: 01/27/2015 02:11AM
>>>   Cc: gluster-users <gluster-users at gluster.org>
>>>   Subject: Re: [Gluster-users] Pretty much any operation related to
>>> Gluster mounted fs hangs for a while
>>>   ======================>>>     Oh, right!
>>>
>>> Follow the outputs:
>>>
>>>
>>> root at web3:/export/images1-1/brick# time getfattr -m . -d -e hex
>>> templates/assets/prod/temporary/13/user_1339200.png
>>> # file: templates/assets/prod/temporary/13/user_1339200.png
>>> trusted.afr.site-images-client-0=0x000000000000000400000000
>>> trusted.afr.site-images-client-1=0x000000020000000900000000
>>> trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527
>>>
>>> real 0m0.024s
>>> user 0m0.001s
>>> sys 0m0.001s
>>>
>>>
>>>
>>> root at web4:/export/images2-1/brick# time getfattr -m . -d -e hex
>>> templates/assets/prod/temporary/13/user_1339200.png
>>> # file: templates/assets/prod/temporary/13/user_1339200.png
>>> trusted.afr.site-images-client-0=0x000000000000000000000000
>>> trusted.afr.site-images-client-1=0x000000000000000000000000
>>> trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3
>>>
>>> real 0m0.003s
>>> user 0m0.000s
>>> sys 0m0.006s
>>>
>>>
>>> Not sure exactly what that means. I'm googling, and would
appreciate if
>>> you
>>> guys can bring some light.
>>>
>>> Thanks!
>>> --
>>> Tiago
>>>
>>>
>>>
>>>
>>> On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal <a.ghoshal at
tcs.com> wrote:
>>>
>>>  Actually you ran getfattr on the volume - which is why the
requisite
>>>> extended attributes never showed up...
>>>>
>>>> Your bricks are mounted elsewhere.
>>>>   /exports/images1-1/brick, and exports/images2-1/brick
>>>>
>>>> Btw, what version of Linux do you use? And, are the files you
observe
>>>> the
>>>> input/output errors on soft-links?
>>>>
>>>>   -----Tiago Santos <tiago at musthavemenus.com> wrote:
-----
>>>>
>>>>   ======================>>>>   To: A Ghoshal
<a.ghoshal at tcs.com>
>>>>   From: Tiago Santos <tiago at musthavemenus.com>
>>>>   Date: 01/27/2015 12:20AM
>>>>   Cc: gluster-users <gluster-users at gluster.org>
>>>>   Subject: Re: [Gluster-users] Pretty much any operation
related to
>>>> Gluster
>>>> mounted fs hangs for a while
>>>>   ======================>>>>     Thanks for you
input, Anirban.
>>>>
>>>> I ran the commands on both servers, with the following results:
>>>>
>>>>
>>>> root at web3:/var/www/site-images# time getfattr -m . -d -e hex
>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>>
>>>> real 0m34.524s
>>>> user 0m0.004s
>>>> sys 0m0.000s
>>>>
>>>>
>>>> root at web4:/var/www/site-images# time getfattr -m . -d -e hex
>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>> getfattr: templates/assets/prod/temporary/13/user_1339200.png:
>>>> Input/output
>>>> error
>>>>
>>>> real 0m11.315s
>>>> user 0m0.001s
>>>> sys 0m0.003s
>>>> root at web4:/var/www/site-images# ls
>>>> templates/assets/prod/temporary/13/user_1339200.png
>>>> ls: cannot access
templates/assets/prod/temporary/13/user_1339200.png:
>>>> Input/output error
>>>>
>>>>
>>>>       =====-----=====-----====>>> Notice: The
information contained in this e-mail
>>> message and/or attachments to it may contain
>>> confidential or privileged information. If you are
>>> not the intended recipient, any dissemination, use,
>>> review, distribution, printing or copying of the
>>> information contained in this e-mail message
>>> and/or attachments to it are strictly prohibited. If
>>> you have received this communication in error,
>>> please notify us by reply e-mail or telephone and
>>> immediately and permanently delete the message
>>> and any attachments. Thank you
>>>
>>>
>>>
>>>   _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
>  --
>   *Tiago Santos*
> MustHaveMenus.com
>
>
>

-- 
*Tiago Santos*
MustHaveMenus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150126/4e4c1d75/attachment.html>

Joe Julian

2015-Jan-26 22:36 UTC

head link

[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

Check your client logs. Perhaps the client isn't actually connecting to 
both servers.

On 01/26/2015 02:12 PM, Tiago Santos wrote:> That's what I meant. Sorry for the confusion.
>
> I'm writing on Client1 (same server as Brick1). Client2 (mounted 
> Brick2, on server2) has nothing writing to it (so far).
>
> My wondering is how I went up on having a split-brain if I'm only 
> writing on one client.
>
>
>
>
>
> On Mon, Jan 26, 2015 at 8:04 PM, Joe Julian <joe at julianfamily.org 
> <mailto:joe at julianfamily.org>> wrote:
>
>     Nothing but GlusterFS should be writing to bricks. Mount a client
>     and write there.
>
>
>     On 01/26/2015 01:38 PM, Tiago Santos wrote:
>>     Right.
>>
>>     I have Brick1 being constantly written. But I have nothing
>>     writing on Brick2. It just get "healed" data from Brick1.
>>
>>     This setup is still not in production, and there's no
>>     applications using that data. I have rsyncs constantly updating
>>     Brick1 (bring data from production servers), and then Gluster
>>     updates Brick2.
>>
>>     Which makes me wonder how may I be creating multiple replicas
>>     during a split-brain.
>>
>>
>>     It may be the case that, having a split-brain event, I may be
>>     updating versions of the same file on Brick1 (only), and Gluster
>>     understands it as different versions and things get confuse?
>>
>>
>>     Anyways, while we talk I'm gonna run Joe's precious
procedure on
>>     split-brain recovery.
>>
>>
>>
>>
>>
>>     On Mon, Jan 26, 2015 at 7:23 PM, Joe Julian <joe at
julianfamily.org
>>     <mailto:joe at julianfamily.org>> wrote:
>>
>>         Mismatched GFIDs would happen if a file is created on
>>         multiple replicas during a split-brain event. The GFID is
>>         assigned at file creation.
>>
>>
>>         On 01/26/2015 01:04 PM, A Ghoshal wrote:
>>
>>             Yep, so it is indeed a split-brain caused by a mismatch
>>             of the trusted.gfid attribute.
>>
>>             Sadly, I don't know precisely what causes it.
>>             -Communication loss might be one of the triggers. I am
>>             guessing the files with the problem are dynamic, correct?
>>             In our setup (also replica 2), communication is never a
>>             problem but we do see this when one of the server takes a
>>             reboot. Maybe some obscure and difficult to understand
>>             race between background self-heal and the self heal
daemon...
>>
>>             In any case, a normal procedure for split brain recovery
>>             would work for you if you wish to get you files back in
>>             function. It's easy to find on google. I use the
>>             instructions on Joe Julian's blog page myself.
>>
>>
>>               -----Tiago Santos <tiago at musthavemenus.com
>>             <mailto:tiago at musthavemenus.com>> wrote: -----
>>
>>               ======================>>               To: A
Ghoshal <a.ghoshal at tcs.com
>>             <mailto:a.ghoshal at tcs.com>>
>>               From: Tiago Santos <tiago at musthavemenus.com
>>             <mailto:tiago at musthavemenus.com>>
>>               Date: 01/27/2015 02:11AM
>>               Cc: gluster-users <gluster-users at gluster.org
>>             <mailto:gluster-users at gluster.org>>
>>               Subject: Re: [Gluster-users] Pretty much any operation
>>             related to Gluster mounted fs hangs for a while
>>               ======================>>                 Oh, right!
>>
>>             Follow the outputs:
>>
>>
>>             root at web3:/export/images1-1/brick# time getfattr -m . -d
>>             -e hex
>>             templates/assets/prod/temporary/13/user_1339200.png
>>             # file: templates/assets/prod/temporary/13/user_1339200.png
>>             trusted.afr.site-images-client-0=0x000000000000000400000000
>>             trusted.afr.site-images-client-1=0x000000020000000900000000
>>             trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527
>>
>>             real 0m0.024s
>>             user 0m0.001s
>>             sys 0m0.001s
>>
>>
>>
>>             root at web4:/export/images2-1/brick# time getfattr -m . -d
>>             -e hex
>>             templates/assets/prod/temporary/13/user_1339200.png
>>             # file: templates/assets/prod/temporary/13/user_1339200.png
>>             trusted.afr.site-images-client-0=0x000000000000000000000000
>>             trusted.afr.site-images-client-1=0x000000000000000000000000
>>             trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3
>>
>>             real 0m0.003s
>>             user 0m0.000s
>>             sys 0m0.006s
>>
>>
>>             Not sure exactly what that means. I'm googling, and
would
>>             appreciate if you
>>             guys can bring some light.
>>
>>             Thanks!
>>             --
>>             Tiago
>>
>>
>>
>>
>>             On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal
>>             <a.ghoshal at tcs.com <mailto:a.ghoshal at
tcs.com>> wrote:
>>
>>                 Actually you ran getfattr on the volume - which is
>>                 why the requisite
>>                 extended attributes never showed up...
>>
>>                 Your bricks are mounted elsewhere.
>>                   /exports/images1-1/brick, and exports/images2-1/brick
>>
>>                 Btw, what version of Linux do you use? And, are the
>>                 files you observe the
>>                 input/output errors on soft-links?
>>
>>                   -----Tiago Santos <tiago at musthavemenus.com
>>                 <mailto:tiago at musthavemenus.com>> wrote:
-----
>>
>>                   ======================>>                   To:
A Ghoshal <a.ghoshal at tcs.com
>>                 <mailto:a.ghoshal at tcs.com>>
>>                   From: Tiago Santos <tiago at musthavemenus.com
>>                 <mailto:tiago at musthavemenus.com>>
>>                   Date: 01/27/2015 12:20AM
>>                   Cc: gluster-users <gluster-users at gluster.org
>>                 <mailto:gluster-users at gluster.org>>
>>                   Subject: Re: [Gluster-users] Pretty much any
>>                 operation related to Gluster
>>                 mounted fs hangs for a while
>>                   ======================>>                    
Thanks for you input, Anirban.
>>
>>                 I ran the commands on both servers, with the
>>                 following results:
>>
>>
>>                 root at web3:/var/www/site-images# time getfattr -m .
-d
>>                 -e hex
>>                 templates/assets/prod/temporary/13/user_1339200.png
>>
>>                 real 0m34.524s
>>                 user 0m0.004s
>>                 sys 0m0.000s
>>
>>
>>                 root at web4:/var/www/site-images# time getfattr -m .
-d
>>                 -e hex
>>                 templates/assets/prod/temporary/13/user_1339200.png
>>                 getfattr:
>>                 templates/assets/prod/temporary/13/user_1339200.png:
>>                 Input/output
>>                 error
>>
>>                 real 0m11.315s
>>                 user 0m0.001s
>>                 sys 0m0.003s
>>                 root at web4:/var/www/site-images# ls
>>                 templates/assets/prod/temporary/13/user_1339200.png
>>                 ls: cannot access
>>                 templates/assets/prod/temporary/13/user_1339200.png:
>>                 Input/output error
>>
>>
>>                  =====-----=====-----====>>             Notice:
The information contained in this e-mail
>>             message and/or attachments to it may contain
>>             confidential or privileged information. If you are
>>             not the intended recipient, any dissemination, use,
>>             review, distribution, printing or copying of the
>>             information contained in this e-mail message
>>             and/or attachments to it are strictly prohibited. If
>>             you have received this communication in error,
>>             please notify us by reply e-mail or telephone and
>>             immediately and permanently delete the message
>>             and any attachments. Thank you
>>
>>
>>
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>     -- 
>>     *Tiago Santos*
>>     MustHaveMenus.com
>
>
>
>
> -- 
> *Tiago Santos*
> MustHaveMenus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150126/c22dd4e6/attachment.html>

Gluster users - Jan 2015 - Pretty much any operation related to Gluster mounted fs hangs for a while

[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while

[Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while