thr3ads.net - Gluster users - [Gluster-users] AFR arbiter volumes [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2015-Sep-09 01:46 UTC

[Gluster-users] AFR arbiter volumes

Sending out this mail for awareness/ feedback.
-----------------------------------------------------------------------------
*What:**
*Since glusterfs-3.7, AFR supports creation of arbiter volumes. These
are a special type of replica 3 gluster volume where the 3rd brick is
(always) configured as an arbiter node.What this means is that the 3rd
brick will store only the file name and metadata (including gluster
xattrs), but does not contain any data. Arbiter volumes prevent
split-brains and consumes lesser space than a normal replica 3 volume
and provides better consistency and availability than a replica 2 volume.

*How:**
*You can create an arbiter volume with the following command:
/
gluster volume create <VOLNAME> replica 3 arbiter 1 host1:brick1
host2:brick2 host3:brick3/

Note that the syntax is similar to creating a normal replica 3 volume
with the exception of the /arbiter 1/ keyword. As seen in the command
above, the only permissible values for the replica count and arbiter
count are 3 and 1 respectively. Also, the 3rd brick is always chosen as
the arbiter brick and it is currently not configurable to have any other
brick as the arbiter.

*Client/ Mount behaviour:**
*By default, client quorum (cluster.quorum-type) is set to auto for a
replica 3 volume (including arbiter volumes) when it is created; i.e. at
least 2 bricks need to be up to satisfy quorum and to allow writes. This
setting is not to be changed for arbiter volumes also. Additionally, the
arbiter volume has some additional checks to prevent files from ending
up in split-brain:

* Clients take full file locks when writing to a file as opposed to
range locks in a normal replica 3 volume.

* If 2 bricks are up and if one of them is the arbiter (i.e. the
3rd brick) and it blames the other up brick, then all FOPS will fail
with ENOTCONN (Transport endpoint is not connected). If the arbiter
doesn't blame the other brick, FOPS will be allowed to proceed.
'Blaming' here is w.r.t the values of AFR changelog extended attributes.

* If 2 bricks are up and the arbiter is down, then FOPS will be
allowed.

* In all cases, if there is only one source before the FOP is
initiated and if the FOP fails on that source, the application will
receive ENOTCONN.

Note: It is possible to see if a replica 3 volume has arbiter
configuration from the mount point.
If/$mount_point/.meta/graphs/active/$V0-replicate-0/options/arbiter-count/
exists and its value is 1, then it is an arbiter volume. Also the client
volume graph will have arbiter-count as a xlator option for AFR translators.

*Self-heal daemon behaviour:*

Since the arbiter brick does not store any data for the files, it cannot
be used as a source for data self-heal. For example if there are 2
source bricks B2 and B3 (B3 being arbiter brick) and B2 is down, then
data-self-heal will not happen from B3 to sink brick B1, and will be
pending until B2 comes up and heal can happen from it. Note that
metadata and entry self-heals can still happen from B3 if it is one of
the sources.

-----------------------------------------------------------------------------

Please provide feedback if you have tried it out.
*If you ever encounter a split-brain while using the arbiter volume, it
is a BUG - do report!*
We have had users asking for a way to convert existing replica 2 volumes
to arbiter volumes- this is definitely in our to-do list, in addition to
some performance optimizations.

Thanks,
Ravi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150909/b4da4b7e/attachment.html>

Nagaprasad Sathyanarayana

2015-Sep-09 03:06 UTC

head link

[Gluster-users] [Gluster-devel] AFR arbiter volumes

Thanks Ravi for nicely explaining this. A question on the following section;

"If 2 bricks are up and if one of them is the arbiter (i.e. the 3rd brick)
and it blames the other up brick, then all FOPS will fail with ENOTCONN
(Transport endpoint is not connected). If the arbiter doesn't blame the
other brick, FOPS will be allowed to proceed. 'Blaming' here is w.r.t
the values of AFR changelog extended attributes."

Q: under what circumstances arbiter brick does/does not blame the other brick?

Thanks
Naga
> On 09-Sep-2015, at 7:17 am, Ravishankar N <ravishankar at redhat.com>
wrote:
> 
> If 2 bricks are up and if one of them is the arbiter (i.e. the 3rd brick)
and it blames the other up brick, then all FOPS will fail with ENOTCONN
(Transport endpoint is not connected). If the arbiter doesn't blame the
other brick, FOPS will be allowed to proceed. 'Blaming' here is w.r.t
the values of AFR changelog extended attributes.

David Gossage

2015-Sep-09 15:57 UTC

head link

[Gluster-users] AFR arbiter volumes

Once the volume is created as an Arbiter volume can it at a later time be
changed to a replica 3 with all bricks containing data?

*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

On Tue, Sep 8, 2015 at 8:46 PM, Ravishankar N <ravishankar at redhat.com>
wrote:
> Sending out this mail for awareness/ feedback.
>
>
-----------------------------------------------------------------------------
> *What:*
> Since glusterfs-3.7,  AFR supports creation of arbiter volumes. These are
> a special type of replica 3 gluster volume where the 3rd brick  is (always)
> configured as an arbiter node.What this means is that the 3rd brick will
> store only the file name and metadata (including gluster xattrs), but does
> not contain any data. Arbiter volumes prevent split-brains and consumes
> lesser space than a normal replica 3 volume and provides better consistency
> and availability than a replica 2 volume.
>
> *How:*
> You can create an arbiter volume with the following command:
>
> * gluster volume create <VOLNAME> replica 3 arbiter 1 host1:brick1
> host2:brick2 host3:brick3*
>
> Note that the syntax is similar to creating a normal replica 3 volume with
> the exception of the *arbiter 1* keyword. As seen in the command above,
> the only permissible values for the replica count and arbiter count are 3
> and 1 respectively. Also, the 3rd brick is always chosen as the arbiter
> brick and it is currently not configurable to have any other brick as the
> arbiter.
>
> *Client/ Mount behaviour:*
> By default, client quorum (cluster.quorum-type) is set to auto for a
> replica 3 volume (including arbiter volumes) when it is created; i.e. at
> least 2 bricks need to be up to satisfy quorum and to allow writes. This
> setting is not to be changed for arbiter volumes also. Additionally, the
> arbiter volume has some additional checks to prevent files from ending up
> in split-brain:
>
>     * Clients take full file locks when writing to a file as opposed to
> range locks in a normal replica 3 volume.
>
>     * If 2 bricks are up and if one of them is the arbiter (i.e. the 3rd
> brick) and it blames the other up brick, then all FOPS will fail with
> ENOTCONN (Transport endpoint is not connected). If the arbiter doesn't
> blame the other brick, FOPS will be allowed to proceed. 'Blaming'
here is
> w.r.t the values of AFR changelog extended attributes.
>
>     * If 2 bricks are up and the arbiter is down, then FOPS will be
> allowed.
>
>     * In all cases, if there is only one source before the FOP is
> initiated and if the FOP fails on that source, the application will receive
> ENOTCONN.
>
> Note: It is possible to see if a replica 3 volume has arbiter
> configuration from the mount point. If*
> $mount_point/.meta/graphs/active/$V0-replicate-0/options/arbiter-count*
> exists and its value is 1, then it is an arbiter volume. Also the client
> volume graph will have arbiter-count as a xlator option for AFR
translators.
>
> *Self-heal daemon behaviour:*
>
> Since the arbiter brick does not store any data for the files, it cannot
> be used as a source for data self-heal. For example if there are 2 source
> bricks B2 and B3 (B3 being arbiter brick) and B2 is down, then
> data-self-heal will not happen from B3 to sink brick B1, and will be
> pending until B2 comes up and heal can happen from it. Note that metadata
> and entry self-heals can still happen from B3 if it is one of the sources.
>
>
>
-----------------------------------------------------------------------------
>
> Please provide feedback if you have tried it out.
> *If you ever encounter a split-brain while using the arbiter volume, it is
> a BUG - do report!*
> We have had users asking for a way to convert existing replica 2 volumes
> to arbiter volumes- this is definitely in our to-do list, in addition to
> some performance optimizations.
>
> Thanks,
> Ravi
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150909/9a380516/attachment.html>

Gluster users - Sep 2015 - AFR arbiter volumes

[Gluster-users] AFR arbiter volumes

[Gluster-users] [Gluster-devel] AFR arbiter volumes

[Gluster-users] AFR arbiter volumes