thr3ads.net - Gluster users - [Gluster-users] GlusterFS as virtual machine storage [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Alastair Neil

2017-Sep-07 15:52 UTC

[Gluster-users] GlusterFS as virtual machine storage

*shrug* I don't use arbiter for vm work loads just straight replica 3.
There are some gotchas with using an arbiter for VM workloads.  If
quorum-type is auto and a brick that is not the arbiter drop out then if
the up brick is dirty as far as the arbiter is concerned i.e. the only good
copy is on the down brick you will get ENOTCONN and your VMs will halt on
IO.

On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote:
> Mh, I never had to do that and I never had that problem. Is that an
> arbiter specific thing ? With replica 3 it just works.
>
> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote:
> > you need to set
> >
> > cluster.server-quorum-ratio             51%
> >
> > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot at
gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I have promised to do some testing and I finally find some time
and
> > > infrastructure.
> > >
> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created
> > > replicated volume with arbiter (2+1) and VM on KVM (via
Openstack)
> > > with disk accessible through gfapi. Volume group is set to virt
> > > (gluster volume set gv_openstack_1 virt). VM runs current (all
> > > packages updated) Ubuntu Xenial.
> > >
> > > I set up following fio job:
> > >
> > > [job1]
> > > ioengine=libaio
> > > size=1g
> > > loops=16
> > > bs=512k
> > > direct=1
> > > filename=/tmp/fio.data2
> > >
> > > When I run fio fio.job and reboot one of the data nodes, IO
statistics
> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root
> > > filesystem gets remounted as read-only.
> > >
> > > If you care about infrastructure, setup details etc., do not
hesitate
> to
> > > ask.
> > >
> > > Gluster info on volume:
> > >
> > > Volume Name: gv_openstack_1
> > > Type: Replicate
> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: gfs-2.san:/export/gfs/gv_1
> > > Brick2: gfs-3.san:/export/gfs/gv_1
> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter)
> > > Options Reconfigured:
> > > nfs.disable: on
> > > transport.address-family: inet
> > > performance.quick-read: off
> > > performance.read-ahead: off
> > > performance.io-cache: off
> > > performance.stat-prefetch: off
> > > performance.low-prio-threads: 32
> > > network.remote-dio: enable
> > > cluster.eager-lock: enable
> > > cluster.quorum-type: auto
> > > cluster.server-quorum-type: server
> > > cluster.data-self-heal-algorithm: full
> > > cluster.locking-scheme: granular
> > > cluster.shd-max-threads: 8
> > > cluster.shd-wait-qlength: 10000
> > > features.shard: on
> > > user.cifs: off
> > >
> > > Partial KVM XML dump:
> > >
> > >     <disk type='network' device='disk'>
> > >       <driver name='qemu' type='raw'
cache='none'/>
> > >       <source protocol='gluster'
> > >
name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'>
> > >         <host name='10.0.1.201'
port='24007'/>
> > >       </source>
> > >       <backingStore/>
> > >       <target dev='vda' bus='virtio'/>
> > >      
<serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial>
> > >       <alias name='virtio-disk0'/>
> > >       <address type='pci' domain='0x0000'
bus='0x00' slot='0x04'
> > > function='0x0'/>
> > >     </disk>
> > >
> > > Networking is LACP on data nodes, stack of Juniper EX4550's
(10Gbps
> > > SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all
> > > nodes (including arbiter).
> > >
> > > I would really love to know what am I doing wrong, because this
is my
> > > experience with Gluster for a long time a and a reason I would
not
> > > recommend it as VM storage backend in production environment
where you
> > > cannot start/stop VMs on your own (e.g. providing private clouds
for
> > > customers).
> > > -ps
> > >
> > >
> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at
assyoma.it>
> > > wrote:
> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto:
> > > >>
> > > >> There has ben a bug associated to sharding that led to
VM corruption
> > > >> that has been around for a long time (difficult to
reproduce I
> > > >> understood). I have not seen reports on that for some
time after the
> > > >> last fix, so hopefully now VM hosting is stable.
> > > >
> > > >
> > > > Mmmm... this is precisely the kind of bug that scares me...
data
> > > corruption
> > > > :|
> > > > Any more information on what causes it and how to resolve?
Even if in
> > > newer
> > > > Gluster releases it is a solved bug, knowledge on how to
treat it
> would
> > > be
> > > > valuable.
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > --
> > > > Danti Gionatan
> > > > Supporto Tecnico
> > > > Assyoma S.r.l. - www.assyoma.it
> > > > email: g.danti at assyoma.it - info at assyoma.it
> > > > GPG public key ID: FF5F32A8
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://lists.gluster.org/mailman/listinfo/gluster-users
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> > >
>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/2362dd1b/attachment.html>

Pavel Szalbot

2017-Sep-07 18:06 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Hi Neil, docs mention two live nodes of replica 3 blaming each other and
refusing to do IO.

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/#1-replica-3-volume



On Sep 7, 2017 17:52, "Alastair Neil" <ajneil.tech at gmail.com>
wrote:
> *shrug* I don't use arbiter for vm work loads just straight replica 3.
> There are some gotchas with using an arbiter for VM workloads.  If
> quorum-type is auto and a brick that is not the arbiter drop out then if
> the up brick is dirty as far as the arbiter is concerned i.e. the only good
> copy is on the down brick you will get ENOTCONN and your VMs will halt on
> IO.
>
> On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote:
>
>> Mh, I never had to do that and I never had that problem. Is that an
>> arbiter specific thing ? With replica 3 it just works.
>>
>> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote:
>> > you need to set
>> >
>> > cluster.server-quorum-ratio             51%
>> >
>> > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot at
gmail.com>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I have promised to do some testing and I finally find some
time and
>> > > infrastructure.
>> > >
>> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I
created
>> > > replicated volume with arbiter (2+1) and VM on KVM (via
Openstack)
>> > > with disk accessible through gfapi. Volume group is set to
virt
>> > > (gluster volume set gv_openstack_1 virt). VM runs current
(all
>> > > packages updated) Ubuntu Xenial.
>> > >
>> > > I set up following fio job:
>> > >
>> > > [job1]
>> > > ioengine=libaio
>> > > size=1g
>> > > loops=16
>> > > bs=512k
>> > > direct=1
>> > > filename=/tmp/fio.data2
>> > >
>> > > When I run fio fio.job and reboot one of the data nodes, IO
statistics
>> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a while,
root
>> > > filesystem gets remounted as read-only.
>> > >
>> > > If you care about infrastructure, setup details etc., do not
hesitate
>> to
>> > > ask.
>> > >
>> > > Gluster info on volume:
>> > >
>> > > Volume Name: gv_openstack_1
>> > > Type: Replicate
>> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1
>> > > Status: Started
>> > > Snapshot Count: 0
>> > > Number of Bricks: 1 x (2 + 1) = 3
>> > > Transport-type: tcp
>> > > Bricks:
>> > > Brick1: gfs-2.san:/export/gfs/gv_1
>> > > Brick2: gfs-3.san:/export/gfs/gv_1
>> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter)
>> > > Options Reconfigured:
>> > > nfs.disable: on
>> > > transport.address-family: inet
>> > > performance.quick-read: off
>> > > performance.read-ahead: off
>> > > performance.io-cache: off
>> > > performance.stat-prefetch: off
>> > > performance.low-prio-threads: 32
>> > > network.remote-dio: enable
>> > > cluster.eager-lock: enable
>> > > cluster.quorum-type: auto
>> > > cluster.server-quorum-type: server
>> > > cluster.data-self-heal-algorithm: full
>> > > cluster.locking-scheme: granular
>> > > cluster.shd-max-threads: 8
>> > > cluster.shd-wait-qlength: 10000
>> > > features.shard: on
>> > > user.cifs: off
>> > >
>> > > Partial KVM XML dump:
>> > >
>> > >     <disk type='network' device='disk'>
>> > >       <driver name='qemu' type='raw'
cache='none'/>
>> > >       <source protocol='gluster'
>> > >
name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'>
>> > >         <host name='10.0.1.201'
port='24007'/>
>> > >       </source>
>> > >       <backingStore/>
>> > >       <target dev='vda' bus='virtio'/>
>> > >      
<serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial>
>> > >       <alias name='virtio-disk0'/>
>> > >       <address type='pci' domain='0x0000'
bus='0x00' slot='0x04'
>> > > function='0x0'/>
>> > >     </disk>
>> > >
>> > > Networking is LACP on data nodes, stack of Juniper
EX4550's (10Gbps
>> > > SFP+), separate VLAN for Gluster traffic, SSD only on Gluster
all
>> > > nodes (including arbiter).
>> > >
>> > > I would really love to know what am I doing wrong, because
this is my
>> > > experience with Gluster for a long time a and a reason I
would not
>> > > recommend it as VM storage backend in production environment
where you
>> > > cannot start/stop VMs on your own (e.g. providing private
clouds for
>> > > customers).
>> > > -ps
>> > >
>> > >
>> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti
at assyoma.it>
>> > > wrote:
>> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto:
>> > > >>
>> > > >> There has ben a bug associated to sharding that led
to VM
>> corruption
>> > > >> that has been around for a long time (difficult to
reproduce I
>> > > >> understood). I have not seen reports on that for
some time after
>> the
>> > > >> last fix, so hopefully now VM hosting is stable.
>> > > >
>> > > >
>> > > > Mmmm... this is precisely the kind of bug that scares
me... data
>> > > corruption
>> > > > :|
>> > > > Any more information on what causes it and how to
resolve? Even if
>> in
>> > > newer
>> > > > Gluster releases it is a solved bug, knowledge on how to
treat it
>> would
>> > > be
>> > > > valuable.
>> > > >
>> > > >
>> > > > Thanks.
>> > > >
>> > > > --
>> > > > Danti Gionatan
>> > > > Supporto Tecnico
>> > > > Assyoma S.r.l. - www.assyoma.it
>> > > > email: g.danti at assyoma.it - info at assyoma.it
>> > > > GPG public key ID: FF5F32A8
>> > > > _______________________________________________
>> > > > Gluster-users mailing list
>> > > > Gluster-users at gluster.org
>> > > > http://lists.gluster.org/mailman/listinfo/gluster-users
>> > > _______________________________________________
>> > > Gluster-users mailing list
>> > > Gluster-users at gluster.org
>> > > http://lists.gluster.org/mailman/listinfo/gluster-users
>> > >
>>
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/3124f5d0/attachment.html>

Alastair Neil

2017-Sep-07 19:25 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

True but to work your way into that problem with replica 3 is a lot harder
to achieve than with just replica 2 + arbiter.

On 7 September 2017 at 14:06, Pavel Szalbot <pavel.szalbot at gmail.com>
wrote:
> Hi Neil, docs mention two live nodes of replica 3 blaming each other and
> refusing to do IO.
>
> https://gluster.readthedocs.io/en/latest/Administrator%
> 20Guide/Split%20brain%20and%20ways%20to%20deal%20with%
> 20it/#1-replica-3-volume
>
>
>
> On Sep 7, 2017 17:52, "Alastair Neil" <ajneil.tech at
gmail.com> wrote:
>
>> *shrug* I don't use arbiter for vm work loads just straight replica
3.
>> There are some gotchas with using an arbiter for VM workloads.  If
>> quorum-type is auto and a brick that is not the arbiter drop out then
if
>> the up brick is dirty as far as the arbiter is concerned i.e. the only
good
>> copy is on the down brick you will get ENOTCONN and your VMs will halt
on
>> IO.
>>
>> On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote:
>>
>>> Mh, I never had to do that and I never had that problem. Is that an
>>> arbiter specific thing ? With replica 3 it just works.
>>>
>>> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote:
>>> > you need to set
>>> >
>>> > cluster.server-quorum-ratio             51%
>>> >
>>> > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot
at gmail.com>
>>> wrote:
>>> >
>>> > > Hi all,
>>> > >
>>> > > I have promised to do some testing and I finally find
some time and
>>> > > infrastructure.
>>> > >
>>> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I
created
>>> > > replicated volume with arbiter (2+1) and VM on KVM (via
Openstack)
>>> > > with disk accessible through gfapi. Volume group is set
to virt
>>> > > (gluster volume set gv_openstack_1 virt). VM runs current
(all
>>> > > packages updated) Ubuntu Xenial.
>>> > >
>>> > > I set up following fio job:
>>> > >
>>> > > [job1]
>>> > > ioengine=libaio
>>> > > size=1g
>>> > > loops=16
>>> > > bs=512k
>>> > > direct=1
>>> > > filename=/tmp/fio.data2
>>> > >
>>> > > When I run fio fio.job and reboot one of the data nodes,
IO
>>> statistics
>>> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a
while, root
>>> > > filesystem gets remounted as read-only.
>>> > >
>>> > > If you care about infrastructure, setup details etc., do
not
>>> hesitate to
>>> > > ask.
>>> > >
>>> > > Gluster info on volume:
>>> > >
>>> > > Volume Name: gv_openstack_1
>>> > > Type: Replicate
>>> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1
>>> > > Status: Started
>>> > > Snapshot Count: 0
>>> > > Number of Bricks: 1 x (2 + 1) = 3
>>> > > Transport-type: tcp
>>> > > Bricks:
>>> > > Brick1: gfs-2.san:/export/gfs/gv_1
>>> > > Brick2: gfs-3.san:/export/gfs/gv_1
>>> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter)
>>> > > Options Reconfigured:
>>> > > nfs.disable: on
>>> > > transport.address-family: inet
>>> > > performance.quick-read: off
>>> > > performance.read-ahead: off
>>> > > performance.io-cache: off
>>> > > performance.stat-prefetch: off
>>> > > performance.low-prio-threads: 32
>>> > > network.remote-dio: enable
>>> > > cluster.eager-lock: enable
>>> > > cluster.quorum-type: auto
>>> > > cluster.server-quorum-type: server
>>> > > cluster.data-self-heal-algorithm: full
>>> > > cluster.locking-scheme: granular
>>> > > cluster.shd-max-threads: 8
>>> > > cluster.shd-wait-qlength: 10000
>>> > > features.shard: on
>>> > > user.cifs: off
>>> > >
>>> > > Partial KVM XML dump:
>>> > >
>>> > >     <disk type='network'
device='disk'>
>>> > >       <driver name='qemu' type='raw'
cache='none'/>
>>> > >       <source protocol='gluster'
>>> > >
name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'>
>>> > >         <host name='10.0.1.201'
port='24007'/>
>>> > >       </source>
>>> > >       <backingStore/>
>>> > >       <target dev='vda'
bus='virtio'/>
>>> > >      
<serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial>
>>> > >       <alias name='virtio-disk0'/>
>>> > >       <address type='pci'
domain='0x0000' bus='0x00' slot='0x04'
>>> > > function='0x0'/>
>>> > >     </disk>
>>> > >
>>> > > Networking is LACP on data nodes, stack of Juniper
EX4550's (10Gbps
>>> > > SFP+), separate VLAN for Gluster traffic, SSD only on
Gluster all
>>> > > nodes (including arbiter).
>>> > >
>>> > > I would really love to know what am I doing wrong,
because this is my
>>> > > experience with Gluster for a long time a and a reason I
would not
>>> > > recommend it as VM storage backend in production
environment where
>>> you
>>> > > cannot start/stop VMs on your own (e.g. providing private
clouds for
>>> > > customers).
>>> > > -ps
>>> > >
>>> > >
>>> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti
<g.danti at assyoma.it>
>>> > > wrote:
>>> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto:
>>> > > >>
>>> > > >> There has ben a bug associated to sharding that
led to VM
>>> corruption
>>> > > >> that has been around for a long time (difficult
to reproduce I
>>> > > >> understood). I have not seen reports on that for
some time after
>>> the
>>> > > >> last fix, so hopefully now VM hosting is stable.
>>> > > >
>>> > > >
>>> > > > Mmmm... this is precisely the kind of bug that
scares me... data
>>> > > corruption
>>> > > > :|
>>> > > > Any more information on what causes it and how to
resolve? Even if
>>> in
>>> > > newer
>>> > > > Gluster releases it is a solved bug, knowledge on
how to treat it
>>> would
>>> > > be
>>> > > > valuable.
>>> > > >
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > --
>>> > > > Danti Gionatan
>>> > > > Supporto Tecnico
>>> > > > Assyoma S.r.l. - www.assyoma.it
>>> > > > email: g.danti at assyoma.it - info at assyoma.it
>>> > > > GPG public key ID: FF5F32A8
>>> > > > _______________________________________________
>>> > > > Gluster-users mailing list
>>> > > > Gluster-users at gluster.org
>>> > > >
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> > > _______________________________________________
>>> > > Gluster-users mailing list
>>> > > Gluster-users at gluster.org
>>> > > http://lists.gluster.org/mailman/listinfo/gluster-users
>>> > >
>>>
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/5965c275/attachment.html>

2017-Sep-09 01:27 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

I've always wondered what the scenario for these situations are (aside 
from the doc description of nodes coming up and down).

Aren't Gluster writes atomic for all nodes?? I seem to recall Jeff Darcy 
stating that years ago.

So a clean shutdown for maintenance shouldn't be a problem at all. If a 
node didn't get a write, it is the one likely to fail.

So are we really only talking about a crash with data on the fly.

I suppose a crash during the heal phase after a shutdown could trigger 
this issue, especially if you are not using sharding and had huge VM files.

On 9/7/2017 11:06 AM, Pavel Szalbot wrote:> Hi Neil, docs mention two live nodes of replica 3 blaming each other 
> and refusing to do IO.
>
>
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/#1-replica-3-volume
>
>

Maybe Matching Threads

Search for more possibly parallel threads

Gluster users - Sep 2017 - GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

Maybe Matching Threads