*shrug* I don't use arbiter for vm work loads just straight replica 3. There are some gotchas with using an arbiter for VM workloads. If quorum-type is auto and a brick that is not the arbiter drop out then if the up brick is dirty as far as the arbiter is concerned i.e. the only good copy is on the down brick you will get ENOTCONN and your VMs will halt on IO. On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote:> Mh, I never had to do that and I never had that problem. Is that an > arbiter specific thing ? With replica 3 it just works. > > On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote: > > you need to set > > > > cluster.server-quorum-ratio 51% > > > > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot at gmail.com> > wrote: > > > > > Hi all, > > > > > > I have promised to do some testing and I finally find some time and > > > infrastructure. > > > > > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created > > > replicated volume with arbiter (2+1) and VM on KVM (via Openstack) > > > with disk accessible through gfapi. Volume group is set to virt > > > (gluster volume set gv_openstack_1 virt). VM runs current (all > > > packages updated) Ubuntu Xenial. > > > > > > I set up following fio job: > > > > > > [job1] > > > ioengine=libaio > > > size=1g > > > loops=16 > > > bs=512k > > > direct=1 > > > filename=/tmp/fio.data2 > > > > > > When I run fio fio.job and reboot one of the data nodes, IO statistics > > > reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root > > > filesystem gets remounted as read-only. > > > > > > If you care about infrastructure, setup details etc., do not hesitate > to > > > ask. > > > > > > Gluster info on volume: > > > > > > Volume Name: gv_openstack_1 > > > Type: Replicate > > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1 > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 x (2 + 1) = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: gfs-2.san:/export/gfs/gv_1 > > > Brick2: gfs-3.san:/export/gfs/gv_1 > > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter) > > > Options Reconfigured: > > > nfs.disable: on > > > transport.address-family: inet > > > performance.quick-read: off > > > performance.read-ahead: off > > > performance.io-cache: off > > > performance.stat-prefetch: off > > > performance.low-prio-threads: 32 > > > network.remote-dio: enable > > > cluster.eager-lock: enable > > > cluster.quorum-type: auto > > > cluster.server-quorum-type: server > > > cluster.data-self-heal-algorithm: full > > > cluster.locking-scheme: granular > > > cluster.shd-max-threads: 8 > > > cluster.shd-wait-qlength: 10000 > > > features.shard: on > > > user.cifs: off > > > > > > Partial KVM XML dump: > > > > > > <disk type='network' device='disk'> > > > <driver name='qemu' type='raw' cache='none'/> > > > <source protocol='gluster' > > > name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'> > > > <host name='10.0.1.201' port='24007'/> > > > </source> > > > <backingStore/> > > > <target dev='vda' bus='virtio'/> > > > <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial> > > > <alias name='virtio-disk0'/> > > > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > > > function='0x0'/> > > > </disk> > > > > > > Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps > > > SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all > > > nodes (including arbiter). > > > > > > I would really love to know what am I doing wrong, because this is my > > > experience with Gluster for a long time a and a reason I would not > > > recommend it as VM storage backend in production environment where you > > > cannot start/stop VMs on your own (e.g. providing private clouds for > > > customers). > > > -ps > > > > > > > > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at assyoma.it> > > > wrote: > > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto: > > > >> > > > >> There has ben a bug associated to sharding that led to VM corruption > > > >> that has been around for a long time (difficult to reproduce I > > > >> understood). I have not seen reports on that for some time after the > > > >> last fix, so hopefully now VM hosting is stable. > > > > > > > > > > > > Mmmm... this is precisely the kind of bug that scares me... data > > > corruption > > > > :| > > > > Any more information on what causes it and how to resolve? Even if in > > > newer > > > > Gluster releases it is a solved bug, knowledge on how to treat it > would > > > be > > > > valuable. > > > > > > > > > > > > Thanks. > > > > > > > > -- > > > > Danti Gionatan > > > > Supporto Tecnico > > > > Assyoma S.r.l. - www.assyoma.it > > > > email: g.danti at assyoma.it - info at assyoma.it > > > > GPG public key ID: FF5F32A8 > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/2362dd1b/attachment.html>
Hi Neil, docs mention two live nodes of replica 3 blaming each other and refusing to do IO. https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/#1-replica-3-volume On Sep 7, 2017 17:52, "Alastair Neil" <ajneil.tech at gmail.com> wrote:> *shrug* I don't use arbiter for vm work loads just straight replica 3. > There are some gotchas with using an arbiter for VM workloads. If > quorum-type is auto and a brick that is not the arbiter drop out then if > the up brick is dirty as far as the arbiter is concerned i.e. the only good > copy is on the down brick you will get ENOTCONN and your VMs will halt on > IO. > > On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote: > >> Mh, I never had to do that and I never had that problem. Is that an >> arbiter specific thing ? With replica 3 it just works. >> >> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote: >> > you need to set >> > >> > cluster.server-quorum-ratio 51% >> > >> > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot at gmail.com> >> wrote: >> > >> > > Hi all, >> > > >> > > I have promised to do some testing and I finally find some time and >> > > infrastructure. >> > > >> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created >> > > replicated volume with arbiter (2+1) and VM on KVM (via Openstack) >> > > with disk accessible through gfapi. Volume group is set to virt >> > > (gluster volume set gv_openstack_1 virt). VM runs current (all >> > > packages updated) Ubuntu Xenial. >> > > >> > > I set up following fio job: >> > > >> > > [job1] >> > > ioengine=libaio >> > > size=1g >> > > loops=16 >> > > bs=512k >> > > direct=1 >> > > filename=/tmp/fio.data2 >> > > >> > > When I run fio fio.job and reboot one of the data nodes, IO statistics >> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root >> > > filesystem gets remounted as read-only. >> > > >> > > If you care about infrastructure, setup details etc., do not hesitate >> to >> > > ask. >> > > >> > > Gluster info on volume: >> > > >> > > Volume Name: gv_openstack_1 >> > > Type: Replicate >> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1 >> > > Status: Started >> > > Snapshot Count: 0 >> > > Number of Bricks: 1 x (2 + 1) = 3 >> > > Transport-type: tcp >> > > Bricks: >> > > Brick1: gfs-2.san:/export/gfs/gv_1 >> > > Brick2: gfs-3.san:/export/gfs/gv_1 >> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter) >> > > Options Reconfigured: >> > > nfs.disable: on >> > > transport.address-family: inet >> > > performance.quick-read: off >> > > performance.read-ahead: off >> > > performance.io-cache: off >> > > performance.stat-prefetch: off >> > > performance.low-prio-threads: 32 >> > > network.remote-dio: enable >> > > cluster.eager-lock: enable >> > > cluster.quorum-type: auto >> > > cluster.server-quorum-type: server >> > > cluster.data-self-heal-algorithm: full >> > > cluster.locking-scheme: granular >> > > cluster.shd-max-threads: 8 >> > > cluster.shd-wait-qlength: 10000 >> > > features.shard: on >> > > user.cifs: off >> > > >> > > Partial KVM XML dump: >> > > >> > > <disk type='network' device='disk'> >> > > <driver name='qemu' type='raw' cache='none'/> >> > > <source protocol='gluster' >> > > name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'> >> > > <host name='10.0.1.201' port='24007'/> >> > > </source> >> > > <backingStore/> >> > > <target dev='vda' bus='virtio'/> >> > > <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial> >> > > <alias name='virtio-disk0'/> >> > > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' >> > > function='0x0'/> >> > > </disk> >> > > >> > > Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps >> > > SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all >> > > nodes (including arbiter). >> > > >> > > I would really love to know what am I doing wrong, because this is my >> > > experience with Gluster for a long time a and a reason I would not >> > > recommend it as VM storage backend in production environment where you >> > > cannot start/stop VMs on your own (e.g. providing private clouds for >> > > customers). >> > > -ps >> > > >> > > >> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at assyoma.it> >> > > wrote: >> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto: >> > > >> >> > > >> There has ben a bug associated to sharding that led to VM >> corruption >> > > >> that has been around for a long time (difficult to reproduce I >> > > >> understood). I have not seen reports on that for some time after >> the >> > > >> last fix, so hopefully now VM hosting is stable. >> > > > >> > > > >> > > > Mmmm... this is precisely the kind of bug that scares me... data >> > > corruption >> > > > :| >> > > > Any more information on what causes it and how to resolve? Even if >> in >> > > newer >> > > > Gluster releases it is a solved bug, knowledge on how to treat it >> would >> > > be >> > > > valuable. >> > > > >> > > > >> > > > Thanks. >> > > > >> > > > -- >> > > > Danti Gionatan >> > > > Supporto Tecnico >> > > > Assyoma S.r.l. - www.assyoma.it >> > > > email: g.danti at assyoma.it - info at assyoma.it >> > > > GPG public key ID: FF5F32A8 >> > > > _______________________________________________ >> > > > Gluster-users mailing list >> > > > Gluster-users at gluster.org >> > > > http://lists.gluster.org/mailman/listinfo/gluster-users >> > > _______________________________________________ >> > > Gluster-users mailing list >> > > Gluster-users at gluster.org >> > > http://lists.gluster.org/mailman/listinfo/gluster-users >> > > >> >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/3124f5d0/attachment.html>
True but to work your way into that problem with replica 3 is a lot harder to achieve than with just replica 2 + arbiter. On 7 September 2017 at 14:06, Pavel Szalbot <pavel.szalbot at gmail.com> wrote:> Hi Neil, docs mention two live nodes of replica 3 blaming each other and > refusing to do IO. > > https://gluster.readthedocs.io/en/latest/Administrator% > 20Guide/Split%20brain%20and%20ways%20to%20deal%20with% > 20it/#1-replica-3-volume > > > > On Sep 7, 2017 17:52, "Alastair Neil" <ajneil.tech at gmail.com> wrote: > >> *shrug* I don't use arbiter for vm work loads just straight replica 3. >> There are some gotchas with using an arbiter for VM workloads. If >> quorum-type is auto and a brick that is not the arbiter drop out then if >> the up brick is dirty as far as the arbiter is concerned i.e. the only good >> copy is on the down brick you will get ENOTCONN and your VMs will halt on >> IO. >> >> On 6 September 2017 at 16:06, <lemonnierk at ulrar.net> wrote: >> >>> Mh, I never had to do that and I never had that problem. Is that an >>> arbiter specific thing ? With replica 3 it just works. >>> >>> On Wed, Sep 06, 2017 at 03:59:14PM -0400, Alastair Neil wrote: >>> > you need to set >>> > >>> > cluster.server-quorum-ratio 51% >>> > >>> > On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot at gmail.com> >>> wrote: >>> > >>> > > Hi all, >>> > > >>> > > I have promised to do some testing and I finally find some time and >>> > > infrastructure. >>> > > >>> > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created >>> > > replicated volume with arbiter (2+1) and VM on KVM (via Openstack) >>> > > with disk accessible through gfapi. Volume group is set to virt >>> > > (gluster volume set gv_openstack_1 virt). VM runs current (all >>> > > packages updated) Ubuntu Xenial. >>> > > >>> > > I set up following fio job: >>> > > >>> > > [job1] >>> > > ioengine=libaio >>> > > size=1g >>> > > loops=16 >>> > > bs=512k >>> > > direct=1 >>> > > filename=/tmp/fio.data2 >>> > > >>> > > When I run fio fio.job and reboot one of the data nodes, IO >>> statistics >>> > > reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root >>> > > filesystem gets remounted as read-only. >>> > > >>> > > If you care about infrastructure, setup details etc., do not >>> hesitate to >>> > > ask. >>> > > >>> > > Gluster info on volume: >>> > > >>> > > Volume Name: gv_openstack_1 >>> > > Type: Replicate >>> > > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1 >>> > > Status: Started >>> > > Snapshot Count: 0 >>> > > Number of Bricks: 1 x (2 + 1) = 3 >>> > > Transport-type: tcp >>> > > Bricks: >>> > > Brick1: gfs-2.san:/export/gfs/gv_1 >>> > > Brick2: gfs-3.san:/export/gfs/gv_1 >>> > > Brick3: docker3.san:/export/gfs/gv_1 (arbiter) >>> > > Options Reconfigured: >>> > > nfs.disable: on >>> > > transport.address-family: inet >>> > > performance.quick-read: off >>> > > performance.read-ahead: off >>> > > performance.io-cache: off >>> > > performance.stat-prefetch: off >>> > > performance.low-prio-threads: 32 >>> > > network.remote-dio: enable >>> > > cluster.eager-lock: enable >>> > > cluster.quorum-type: auto >>> > > cluster.server-quorum-type: server >>> > > cluster.data-self-heal-algorithm: full >>> > > cluster.locking-scheme: granular >>> > > cluster.shd-max-threads: 8 >>> > > cluster.shd-wait-qlength: 10000 >>> > > features.shard: on >>> > > user.cifs: off >>> > > >>> > > Partial KVM XML dump: >>> > > >>> > > <disk type='network' device='disk'> >>> > > <driver name='qemu' type='raw' cache='none'/> >>> > > <source protocol='gluster' >>> > > name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'> >>> > > <host name='10.0.1.201' port='24007'/> >>> > > </source> >>> > > <backingStore/> >>> > > <target dev='vda' bus='virtio'/> >>> > > <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial> >>> > > <alias name='virtio-disk0'/> >>> > > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' >>> > > function='0x0'/> >>> > > </disk> >>> > > >>> > > Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps >>> > > SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all >>> > > nodes (including arbiter). >>> > > >>> > > I would really love to know what am I doing wrong, because this is my >>> > > experience with Gluster for a long time a and a reason I would not >>> > > recommend it as VM storage backend in production environment where >>> you >>> > > cannot start/stop VMs on your own (e.g. providing private clouds for >>> > > customers). >>> > > -ps >>> > > >>> > > >>> > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at assyoma.it> >>> > > wrote: >>> > > > Il 30-08-2017 17:07 Ivan Rossi ha scritto: >>> > > >> >>> > > >> There has ben a bug associated to sharding that led to VM >>> corruption >>> > > >> that has been around for a long time (difficult to reproduce I >>> > > >> understood). I have not seen reports on that for some time after >>> the >>> > > >> last fix, so hopefully now VM hosting is stable. >>> > > > >>> > > > >>> > > > Mmmm... this is precisely the kind of bug that scares me... data >>> > > corruption >>> > > > :| >>> > > > Any more information on what causes it and how to resolve? Even if >>> in >>> > > newer >>> > > > Gluster releases it is a solved bug, knowledge on how to treat it >>> would >>> > > be >>> > > > valuable. >>> > > > >>> > > > >>> > > > Thanks. >>> > > > >>> > > > -- >>> > > > Danti Gionatan >>> > > > Supporto Tecnico >>> > > > Assyoma S.r.l. - www.assyoma.it >>> > > > email: g.danti at assyoma.it - info at assyoma.it >>> > > > GPG public key ID: FF5F32A8 >>> > > > _______________________________________________ >>> > > > Gluster-users mailing list >>> > > > Gluster-users at gluster.org >>> > > > http://lists.gluster.org/mailman/listinfo/gluster-users >>> > > _______________________________________________ >>> > > Gluster-users mailing list >>> > > Gluster-users at gluster.org >>> > > http://lists.gluster.org/mailman/listinfo/gluster-users >>> > > >>> >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170907/5965c275/attachment.html>
I've always wondered what the scenario for these situations are (aside from the doc description of nodes coming up and down). Aren't Gluster writes atomic for all nodes?? I seem to recall Jeff Darcy stating that years ago. So a clean shutdown for maintenance shouldn't be a problem at all. If a node didn't get a write, it is the one likely to fail. So are we really only talking about a crash with data on the fly. I suppose a crash during the heal phase after a shutdown could trigger this issue, especially if you are not using sharding and had huge VM files. On 9/7/2017 11:06 AM, Pavel Szalbot wrote:> Hi Neil, docs mention two live nodes of replica 3 blaming each other > and refusing to do IO. > > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/#1-replica-3-volume > >