thr3ads.net - Gluster users - [Gluster-users] split-brain recovery automation, any plans? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Dmitry Melekhov

2016-Jul-12 15:57 UTC

[Gluster-users] split-brain recovery automation, any plans?

12.07.2016 17:38, Pranith Kumar Karampuri ?????:> Did you wait for heals to complete  before upgrading second node?
no...
>
> On Tue, Jul 12, 2016 at 3:08 PM, Dmitry Melekhov <dm at belkam.com 
> <mailto:dm at belkam.com>> wrote:
>
>     12.07.2016 13:31, Pranith Kumar Karampuri ?????:
>>
>>
>>     On Mon, Jul 11, 2016 at 2:26 PM, Dmitry Melekhov <dm at
belkam.com
>>     <mailto:dm at belkam.com>> wrote:
>>
>>         11.07.2016 12:47, Gandalf Corvotempesta ?????:
>>
>>             2016-07-11 9:54 GMT+02:00 Dmitry Melekhov <dm at
belkam.com
>>             <mailto:dm at belkam.com>>:
>>
>>                 We just got split-brain during update to 3.7.13 ;-)
>>
>>             This is an interesting point.
>>             Could you please tell me which replica count did you set ?
>>
>>
>>         3
>>
>>
>>             With replica "3" split brain should not occurs,
right ?
>>
>>
>>         I guess we did something wrong :-)
>>
>>
>>     Or there is a bug we never found? Could you please share details
>>     about what you did?
>
>     upgraded to 3.7.13 from 3.7.11 using yum, while at least one VM is
>     running :-)
>     on all 3 servers, one by one:
>
>     yum upgrade
>     systemctl stop glusterd
>     than killed glusterfsd processes using kill
>     and systemctl start glusterd
>
>     then next server....
>
>     after this we tried to restart VM, but it failed, because we
>     forget to restart libvirtd, and it used old libraries,
>     I guess this is point where we got this problem.
>
>>
>>             I'm planning a new cluster and I would like to be
>>             protected against
>>             split brains.
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>     -- 
>>     Pranith
>
>
>
>
> -- 
> Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160712/adda405b/attachment.html>

Darrell Budic

2016-Jul-12 20:20 UTC

head link

[Gluster-users] split-brain recovery automation, any plans?

FYI, it?s my experience that ?yum upgrade? will stop the running glistered (and
possibly the running glusterfsds) during it?s installation of new gluster
components. I?ve also noticed it starts them back up again during the process.

Ie, yesterday I upgraded a system to 3.7.13:

systemctl stop glusterd
<manually kill all glusterfsds>
yum upgrade

and discovered that glusterd was running again, and had started during the yum
upgrade processing. All it?s glusterfsds had also started. Somewhat annoying,
actually, because I had been planning to reboot the server to switch to the
latest kernel as part of the process, but really didn?t feel like interrupting
the heals at that time.

This probably didn?t have much impact on you, but it would have restarted any
healing that was a result of the upgrade downtime twice. You may have caused
yourself some extra wait for the first round of healing to conclude if stuff was
using those volumes at the time. If you didn?t wait for those to conclude before
starting your next upgrade, you could have caused a split brain on affected
active files.

  -Darrell
> On Jul 12, 2016, at 10:57 AM, Dmitry Melekhov <dm at belkam.com>
wrote:
> 
> 
> 
> 12.07.2016 17:38, Pranith Kumar Karampuri ?????:
>> Did you wait for heals to complete  before upgrading second node?
> 
> no...
> 
>> 
>> On Tue, Jul 12, 2016 at 3:08 PM, Dmitry Melekhov <dm at belkam.com
<mailto:dm at belkam.com>> wrote:
>> 12.07.2016 13:31, Pranith Kumar Karampuri ?????:
>>> 
>>> 
>>> On Mon, Jul 11, 2016 at 2:26 PM, Dmitry Melekhov < <mailto:dm
at belkam.com>dm at belkam.com <mailto:dm at belkam.com>> wrote:
>>> 11.07.2016 12:47, Gandalf Corvotempesta ?????:
>>> 2016-07-11 9:54 GMT+02:00 Dmitry Melekhov < <mailto:dm at
belkam.com>dm at belkam.com <mailto:dm at belkam.com>>:
>>> We just got split-brain during update to 3.7.13 ;-)
>>> This is an interesting point.
>>> Could you please tell me which replica count did you set ?
>>> 
>>> 3
>>> 
>>> With replica "3" split brain should not occurs, right ?
>>> 
>>> I guess we did something wrong :-)
>>> 
>>> Or there is a bug we never found? Could you please share details
about what you did?
>> 
>> upgraded to 3.7.13 from 3.7.11 using yum, while at least one VM is
running :-)
>> on all 3 servers, one by one:
>> 
>> yum upgrade
>> systemctl stop glusterd 
>> than killed glusterfsd processes using kill 
>> and systemctl start glusterd
>> 
>> then next server....
>> 
>> after this we tried to restart VM, but it failed, because we forget to
restart libvirtd, and it used old libraries,
>> I guess this is point where we got this problem.
>> 
>>>  
>>> 
>>> I'm planning a new cluster and I would like to be protected
against
>>> split brains.
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
>>> 
>>> 
>>> -- 
>>> Pranith
>> 
>> 
>> 
>> 
>> -- 
>> Pranith
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160712/3f2604e5/attachment.html>

Pranith Kumar Karampuri

2016-Jul-13 03:44 UTC

head link

[Gluster-users] split-brain recovery automation, any plans?

On Tue, Jul 12, 2016 at 9:27 PM, Dmitry Melekhov <dm at belkam.com> wrote:
>
>
> 12.07.2016 17:38, Pranith Kumar Karampuri ?????:
>
> Did you wait for heals to complete  before upgrading second node?
>
>
> no...
>
So basically if you have operations in progress on the mount, you should
wait for heals to complete before you upgrade second node. If you have all
the operations on all the mounts stopped or you unmounted all the mounts
for the volume, then you can upgrade all the servers one by one then
clients. Otherwise it will lead to problems. That said in 3 way replica it
shouldn't cause split-brains. So I would like to know exact steps that lead
to this problem. We know of one issue which leads to split-brains in case
of VM workloads where we take down bricks in cyclic manner without waiting
for heals to complete. I wonder if the steps that lead to split-brain on
your setup are similar. We are targetting this for future releases...

>
>
>
> On Tue, Jul 12, 2016 at 3:08 PM, Dmitry Melekhov <dm at belkam.com>
wrote:
>
>> 12.07.2016 13:31, Pranith Kumar Karampuri ?????:
>>
>>
>>
>> On Mon, Jul 11, 2016 at 2:26 PM, Dmitry Melekhov < <dm at
belkam.com>
>> dm at belkam.com> wrote:
>>
>>> 11.07.2016 12:47, Gandalf Corvotempesta ?????:
>>>
>>>> 2016-07-11 9:54 GMT+02:00 Dmitry Melekhov < <dm at
belkam.com>
>>>> dm at belkam.com>:
>>>>
>>>>> We just got split-brain during update to 3.7.13 ;-)
>>>>>
>>>> This is an interesting point.
>>>> Could you please tell me which replica count did you set ?
>>>>
>>>
>>> 3
>>>
>>>>
>>>> With replica "3" split brain should not occurs, right
?
>>>>
>>>
>>> I guess we did something wrong :-)
>>
>>
>> Or there is a bug we never found? Could you please share details about
>> what you did?
>>
>>
>> upgraded to 3.7.13 from 3.7.11 using yum, while at least one VM is
>> running :-)
>> on all 3 servers, one by one:
>>
>> yum upgrade
>> systemctl stop glusterd
>> than killed glusterfsd processes using kill
>> and systemctl start glusterd
>>
>> then next server....
>>
>> after this we tried to restart VM, but it failed, because we forget to
>> restart libvirtd, and it used old libraries,
>> I guess this is point where we got this problem.
>>
>>
>>
>>>
>>> I'm planning a new cluster and I would like to be protected
against
>>>> split brains.
>>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>
>
> --
> Pranith
>
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160713/d5c5fb82/attachment.html>

Gluster users - Jul 2016 - split-brain recovery automation, any plans?

[Gluster-users] split-brain recovery automation, any plans?

[Gluster-users] split-brain recovery automation, any plans?

[Gluster-users] split-brain recovery automation, any plans?