thr3ads.net - Gluster users - [Gluster-users] GlusterFS as virtual machine storage [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Alastair Neil

2017-Sep-07 19:25 UTC

[Gluster-users] GlusterFS as virtual machine storage

True but to work your way into that problem with replica 3 is a lot harder
to achieve than with just replica 2 + arbiter.

On 7 September 2017 at 14:06, Pavel Szalbot <pavel.szalbot at gmail.com>
wrote:
> Hi Neil, docs mention two live nodes of replica 3 blaming each other and
> refusing to do IO.
>
> https://gluster.readthedocs.io/en/latest/Administrator%
> 20Guide/Split%20brain%20and%20ways%20to%20deal%20with%
> 20it/#1-replica-3-volume
>
>
>
> On Sep 7, 2017 17:52, "Alastair Neil" <ajneil.tech at
gmail.com> wrote:
>
>> *shrug* I don't use arbiter for vm work loads just straight replica
3.
>> There are some gotchas with using an arbiter for VM workloads.  If
>> quorum-type is auto and a brick that is not the arbiter drop out then
if
>> the up brick is dirty as far as the arbiter is concerned i.e. the only
good
>> copy is on the down brick you will get ENOTCONN and your VMs will halt
on
>> IO.
>>
>> On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote:
>>
>>> Mh, I never had to do that and I never had that problem. Is that an
>>> arbiter specific thing ? With replica 3 it just works.
>>>
>>> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote:
>>> > you need to set
>>> >
>>> > cluster.server-quorum-ratio             51%
>>> >
>>> > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot
at gmail.com>
>>> wrote:
>>> >
>>> > > Hi all,
>>> > >
>>> > > I have promised to do some testing and I finally find
some time and
>>> > > infrastructure.
>>> > >
>>> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I
created
>>> > > replicated volume with arbiter (2+1) and VM on KVM (via
Openstack)
>>> > > with disk accessible through gfapi. Volume group is set
to virt
>>> > > (gluster volume set gv_openstack_1 virt). VM runs current
(all
>>> > > packages updated) Ubuntu Xenial.
>>> > >
>>> > > I set up following fio job:
>>> > >
>>> > > [job1]
>>> > > ioengine=libaio
>>> > > size=1g
>>> > > loops=16
>>> > > bs=512k
>>> > > direct=1
>>> > > filename=/tmp/fio.data2
>>> > >
>>> > > When I run fio fio.job and reboot one of the data nodes,
IO
>>> statistics
>>> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a
while, root
>>> > > filesystem gets remounted as read-only.
>>> > >
>>> > > If you care about infrastructure, setup details etc., do
not
>>> hesitate to
>>> > > ask.
>>> > >
>>> > > Gluster info on volume:
>>> > >
>>> > > Volume Name: gv_openstack_1
>>> > > Type: Replicate
>>> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1
>>> > > Status: Started
>>> > > Snapshot Count: 0
>>> > > Number of Bricks: 1 x (2 + 1) = 3
>>> > > Transport-type: tcp
>>> > > Bricks:
>>> > > Brick1: gfs-2.san:/export/gfs/gv_1
>>> > > Brick2: gfs-3.san:/export/gfs/gv_1
>>> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter)
>>> > > Options Reconfigured:
>>> > > nfs.disable: on
>>> > > transport.address-family: inet
>>> > > performance.quick-read: off
>>> > > performance.read-ahead: off
>>> > > performance.io-cache: off
>>> > > performance.stat-prefetch: off
>>> > > performance.low-prio-threads: 32
>>> > > network.remote-dio: enable
>>> > > cluster.eager-lock: enable
>>> > > cluster.quorum-type: auto
>>> > > cluster.server-quorum-type: server
>>> > > cluster.data-self-heal-algorithm: full
>>> > > cluster.locking-scheme: granular
>>> > > cluster.shd-max-threads: 8
>>> > > cluster.shd-wait-qlength: 10000
>>> > > features.shard: on
>>> > > user.cifs: off
>>> > >
>>> > > Partial KVM XML dump:
>>> > >
>>> > >     <disk type='network'
device='disk'>
>>> > >       <driver name='qemu' type='raw'
cache='none'/>
>>> > >       <source protocol='gluster'
>>> > >
name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'>
>>> > >         <host name='10.0.1.201'
port='24007'/>
>>> > >       </source>
>>> > >       <backingStore/>
>>> > >       <target dev='vda'
bus='virtio'/>
>>> > >      
<serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial>
>>> > >       <alias name='virtio-disk0'/>
>>> > >       <address type='pci'
domain='0x0000' bus='0x00' slot='0x04'
>>> > > function='0x0'/>
>>> > >     </disk>
>>> > >
>>> > > Networking is LACP on data nodes, stack of Juniper
EX4550's (10Gbps
>>> > > SFP+), separate VLAN for Gluster traffic, SSD only on
Gluster all
>>> > > nodes (including arbiter).
>>> > >
>>> > > I would really love to know what am I doing wrong,
because this is my
>>> > > experience with Gluster for a long time a and a reason I
would not
>>> > > recommend it as VM storage backend in production
environment where
>>> you
>>> > > cannot start/stop VMs on your own (e.g. providing private
clouds for
>>> > > customers).
>>> > > -ps
>>> > >
>>> > >
>>> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti
<g.danti at assyoma.it>
>>> > > wrote:
>>> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto:
>>> > > >>
>>> > > >> There has ben a bug associated to sharding that
led to VM
>>> corruption
>>> > > >> that has been around for a long time (difficult
to reproduce I
>>> > > >> understood). I have not seen reports on that for
some time after
>>> the
>>> > > >> last fix, so hopefully now VM hosting is stable.
>>> > > >
>>> > > >
>>> > > > Mmmm... this is precisely the kind of bug that
scares me... data
>>> > > corruption
>>> > > > :|
>>> > > > Any more information on what causes it and how to
resolve? Even if
>>> in
>>> > > newer
>>> > > > Gluster releases it is a solved bug, knowledge on
how to treat it
>>> would
>>> > > be
>>> > > > valuable.
>>> > > >
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > --
>>> > > > Danti Gionatan
>>> > > > Supporto Tecnico
>>> > > > Assyoma S.r.l. - www.assyoma.it
>>> > > > email: g.danti at assyoma.it - info at assyoma.it
>>> > > > GPG public key ID: FF5F32A8
>>> > > > _______________________________________________
>>> > > > Gluster-users mailing list
>>> > > > Gluster-users at gluster.org
>>> > > >
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> > > _______________________________________________
>>> > > Gluster-users mailing list
>>> > > Gluster-users at gluster.org
>>> > > http://lists.gluster.org/mailman/listinfo/gluster-users
>>> > >
>>>
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/5965c275/attachment.html>

Pavel Szalbot

2017-Sep-08 05:04 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Seems to be so, but if we look back at the described setup and procedure -
what is the reason for iops to stop/fail? Rebooting a node is somewhat
similar to updating gluster, replacing cabling etc. IMO this should not
always end up with arbiter blaming the other node and even though I did not
investigate this issue deeply, I do not believe the blame is the reason for
iops to drop.

On Sep 7, 2017 21:25, "Alastair Neil" <ajneil.tech at gmail.com>
wrote:
> True but to work your way into that problem with replica 3 is a lot harder
> to achieve than with just replica 2 + arbiter.
>
> On 7 September 2017 at 14:06, Pavel Szalbot <pavel.szalbot at
gmail.com>
> wrote:
>
>> Hi Neil, docs mention two live nodes of replica 3 blaming each other
and
>> refusing to do IO.
>>
>> https://gluster.readthedocs.io/en/latest/Administrator%20Gui
>>
de/Split%20brain%20and%20ways%20to%20deal%20with%20it/#1-replica-3-volume
>>
>>
>>
>> On Sep 7, 2017 17:52, "Alastair Neil" <ajneil.tech at
gmail.com> wrote:
>>
>>> *shrug* I don't use arbiter for vm work loads just straight
replica 3.
>>> There are some gotchas with using an arbiter for VM workloads.  If
>>> quorum-type is auto and a brick that is not the arbiter drop out
then if
>>> the up brick is dirty as far as the arbiter is concerned i.e. the
only good
>>> copy is on the down brick you will get ENOTCONN and your VMs will
halt on
>>> IO.
>>>
>>> On 6 September 2017 at 16:06, <lemonnierk at ulrar.net>
wrote:
>>>
>>>> Mh, I never had to do that and I never had that problem. Is
that an
>>>> arbiter specific thing ? With replica 3 it just works.
>>>>
>>>> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote:
>>>> > you need to set
>>>> >
>>>> > cluster.server-quorum-ratio             51%
>>>> >
>>>> > On 6 September 2017 at 10:12, Pavel Szalbot
<pavel.szalbot at gmail.com>
>>>> wrote:
>>>> >
>>>> > > Hi all,
>>>> > >
>>>> > > I have promised to do some testing and I finally find
some time and
>>>> > > infrastructure.
>>>> > >
>>>> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7.
I created
>>>> > > replicated volume with arbiter (2+1) and VM on KVM
(via Openstack)
>>>> > > with disk accessible through gfapi. Volume group is
set to virt
>>>> > > (gluster volume set gv_openstack_1 virt). VM runs
current (all
>>>> > > packages updated) Ubuntu Xenial.
>>>> > >
>>>> > > I set up following fio job:
>>>> > >
>>>> > > [job1]
>>>> > > ioengine=libaio
>>>> > > size=1g
>>>> > > loops=16
>>>> > > bs=512k
>>>> > > direct=1
>>>> > > filename=/tmp/fio.data2
>>>> > >
>>>> > > When I run fio fio.job and reboot one of the data
nodes, IO
>>>> statistics
>>>> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a
while, root
>>>> > > filesystem gets remounted as read-only.
>>>> > >
>>>> > > If you care about infrastructure, setup details etc.,
do not
>>>> hesitate to
>>>> > > ask.
>>>> > >
>>>> > > Gluster info on volume:
>>>> > >
>>>> > > Volume Name: gv_openstack_1
>>>> > > Type: Replicate
>>>> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1
>>>> > > Status: Started
>>>> > > Snapshot Count: 0
>>>> > > Number of Bricks: 1 x (2 + 1) = 3
>>>> > > Transport-type: tcp
>>>> > > Bricks:
>>>> > > Brick1: gfs-2.san:/export/gfs/gv_1
>>>> > > Brick2: gfs-3.san:/export/gfs/gv_1
>>>> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter)
>>>> > > Options Reconfigured:
>>>> > > nfs.disable: on
>>>> > > transport.address-family: inet
>>>> > > performance.quick-read: off
>>>> > > performance.read-ahead: off
>>>> > > performance.io-cache: off
>>>> > > performance.stat-prefetch: off
>>>> > > performance.low-prio-threads: 32
>>>> > > network.remote-dio: enable
>>>> > > cluster.eager-lock: enable
>>>> > > cluster.quorum-type: auto
>>>> > > cluster.server-quorum-type: server
>>>> > > cluster.data-self-heal-algorithm: full
>>>> > > cluster.locking-scheme: granular
>>>> > > cluster.shd-max-threads: 8
>>>> > > cluster.shd-wait-qlength: 10000
>>>> > > features.shard: on
>>>> > > user.cifs: off
>>>> > >
>>>> > > Partial KVM XML dump:
>>>> > >
>>>> > >     <disk type='network'
device='disk'>
>>>> > >       <driver name='qemu'
type='raw' cache='none'/>
>>>> > >       <source protocol='gluster'
>>>> > >
name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'>
>>>> > >         <host name='10.0.1.201'
port='24007'/>
>>>> > >       </source>
>>>> > >       <backingStore/>
>>>> > >       <target dev='vda'
bus='virtio'/>
>>>> > >      
<serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial>
>>>> > >       <alias name='virtio-disk0'/>
>>>> > >       <address type='pci'
domain='0x0000' bus='0x00' slot='0x04'
>>>> > > function='0x0'/>
>>>> > >     </disk>
>>>> > >
>>>> > > Networking is LACP on data nodes, stack of Juniper
EX4550's (10Gbps
>>>> > > SFP+), separate VLAN for Gluster traffic, SSD only on
Gluster all
>>>> > > nodes (including arbiter).
>>>> > >
>>>> > > I would really love to know what am I doing wrong,
because this is
>>>> my
>>>> > > experience with Gluster for a long time a and a
reason I would not
>>>> > > recommend it as VM storage backend in production
environment where
>>>> you
>>>> > > cannot start/stop VMs on your own (e.g. providing
private clouds for
>>>> > > customers).
>>>> > > -ps
>>>> > >
>>>> > >
>>>> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti
<g.danti at assyoma.it
>>>> >
>>>> > > wrote:
>>>> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto:
>>>> > > >>
>>>> > > >> There has ben a bug associated to sharding
that led to VM
>>>> corruption
>>>> > > >> that has been around for a long time
(difficult to reproduce I
>>>> > > >> understood). I have not seen reports on that
for some time after
>>>> the
>>>> > > >> last fix, so hopefully now VM hosting is
stable.
>>>> > > >
>>>> > > >
>>>> > > > Mmmm... this is precisely the kind of bug that
scares me... data
>>>> > > corruption
>>>> > > > :|
>>>> > > > Any more information on what causes it and how
to resolve? Even
>>>> if in
>>>> > > newer
>>>> > > > Gluster releases it is a solved bug, knowledge
on how to treat it
>>>> would
>>>> > > be
>>>> > > > valuable.
>>>> > > >
>>>> > > >
>>>> > > > Thanks.
>>>> > > >
>>>> > > > --
>>>> > > > Danti Gionatan
>>>> > > > Supporto Tecnico
>>>> > > > Assyoma S.r.l. - www.assyoma.it
>>>> > > > email: g.danti at assyoma.it - info at
assyoma.it
>>>> > > > GPG public key ID: FF5F32A8
>>>> > > > _______________________________________________
>>>> > > > Gluster-users mailing list
>>>> > > > Gluster-users at gluster.org
>>>> > > >
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> > > _______________________________________________
>>>> > > Gluster-users mailing list
>>>> > > Gluster-users at gluster.org
>>>> > >
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> > >
>>>>
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > Gluster-users at gluster.org
>>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170908/d8959716/attachment.html>

Pavel Szalbot

2017-Sep-08 08:50 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

FYI I set up replica 3 (no arbiter this time), did the same thing -
rebooted one node during lots of file IO on VM and IO stopped.

As I mentioned either here or in another thread, this behavior is
caused by high default of network.ping-timeout. My main problem used
to be that setting it to low values like 3s or even 2s did not prevent
the FS to be mounted as read-only in the past (at least with arbiter)
and docs describe reconnect as very costly. If I set ping-timeout to
1s disaster of read-only mount is now prevented.

However I find it very strange because in the past I actually did end
up with read-only filesystem despite of the low ping-timeout.

With replica 3 after node reboot iftop shows data flowing only to the
one of remaining two nodes and there is no entry in heal info for the
volume. Explanation would be very much appreciated ;-)

Few minutes later I reverted back to replica 3 with arbiter (group
virt, ping-timeout 1). All nodes are up. During first fio run, VM
disconnected my ssh session, so I reconnected and saw ext4 problems in
dmesg. I deleted the VM and started a new one. Glustershd.log fills
with metadata heal shortly after fio job starts, but this time system
is stable.
Rebooting one of the nodes does not cause any problem (watching heal
log, i/o on vm).

So I decided to put more stress on VMs disk - I added second job with
direct=1 and started it (now both are running) while one gluster node
is still booting. What happened? One fio job reports "Bus error" and
VM segfaults when trying to run dmesg...

Is this gfapi related? Is this bug in arbiter?

Seemingly Similar Threads

Search for more apparently analagous threads

Gluster users - Sep 2017 - GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

Seemingly Similar Threads