Il 30-08-2017 17:07 Ivan Rossi ha scritto:> There has ben a bug associated to sharding that led to VM corruption > that has been around for a long time (difficult to reproduce I > understood). I have not seen reports on that for some time after the > last fix, so hopefully now VM hosting is stable.Mmmm... this is precisely the kind of bug that scares me... data corruption :| Any more information on what causes it and how to resolve? Even if in newer Gluster releases it is a solved bug, knowledge on how to treat it would be valuable. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
lemonnierk at ulrar.net
2017-Sep-03 22:25 UTC
[Gluster-users] GlusterFS as virtual machine storage
On Sun, Sep 03, 2017 at 10:21:33PM +0200, Gionatan Danti wrote:> Il 30-08-2017 17:07 Ivan Rossi ha scritto: > > There has ben a bug associated to sharding that led to VM corruption > > that has been around for a long time (difficult to reproduce I > > understood). I have not seen reports on that for some time after the > > last fix, so hopefully now VM hosting is stable. > > Mmmm... this is precisely the kind of bug that scares me... data > corruption :| > Any more information on what causes it and how to resolve? Even if in > newer Gluster releases it is a solved bug, knowledge on how to treat it > would be valuable. >I don't have a solution, instead of growing my volumes I just create new ones. Couldn't tell you if it's solved in recent release, never had the courage to try it out :) It's a bit hard to trigger too so having it work once might not be enough. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170903/1fdb7fdd/attachment.sig>
Hi all, I have promised to do some testing and I finally find some time and infrastructure. So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created replicated volume with arbiter (2+1) and VM on KVM (via Openstack) with disk accessible through gfapi. Volume group is set to virt (gluster volume set gv_openstack_1 virt). VM runs current (all packages updated) Ubuntu Xenial. I set up following fio job: [job1] ioengine=libaio size=1g loops=16 bs=512k direct=1 filename=/tmp/fio.data2 When I run fio fio.job and reboot one of the data nodes, IO statistics reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root filesystem gets remounted as read-only. If you care about infrastructure, setup details etc., do not hesitate to ask. Gluster info on volume: Volume Name: gv_openstack_1 Type: Replicate Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gfs-2.san:/export/gfs/gv_1 Brick2: gfs-3.san:/export/gfs/gv_1 Brick3: docker3.san:/export/gfs/gv_1 (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off Partial KVM XML dump: <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source protocol='gluster' name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'> <host name='10.0.1.201' port='24007'/> </source> <backingStore/> <target dev='vda' bus='virtio'/> <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all nodes (including arbiter). I would really love to know what am I doing wrong, because this is my experience with Gluster for a long time a and a reason I would not recommend it as VM storage backend in production environment where you cannot start/stop VMs on your own (e.g. providing private clouds for customers). -ps On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at assyoma.it> wrote:> Il 30-08-2017 17:07 Ivan Rossi ha scritto: >> >> There has ben a bug associated to sharding that led to VM corruption >> that has been around for a long time (difficult to reproduce I >> understood). I have not seen reports on that for some time after the >> last fix, so hopefully now VM hosting is stable. > > > Mmmm... this is precisely the kind of bug that scares me... data corruption > :| > Any more information on what causes it and how to resolve? Even if in newer > Gluster releases it is a solved bug, knowledge on how to treat it would be > valuable. > > > Thanks. > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti at assyoma.it - info at assyoma.it > GPG public key ID: FF5F32A8 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
you need to set cluster.server-quorum-ratio 51% On 6 September 2017 at 10:12, Pavel Szalbot <pavel.szalbot at gmail.com> wrote:> Hi all, > > I have promised to do some testing and I finally find some time and > infrastructure. > > So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created > replicated volume with arbiter (2+1) and VM on KVM (via Openstack) > with disk accessible through gfapi. Volume group is set to virt > (gluster volume set gv_openstack_1 virt). VM runs current (all > packages updated) Ubuntu Xenial. > > I set up following fio job: > > [job1] > ioengine=libaio > size=1g > loops=16 > bs=512k > direct=1 > filename=/tmp/fio.data2 > > When I run fio fio.job and reboot one of the data nodes, IO statistics > reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root > filesystem gets remounted as read-only. > > If you care about infrastructure, setup details etc., do not hesitate to > ask. > > Gluster info on volume: > > Volume Name: gv_openstack_1 > Type: Replicate > Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: gfs-2.san:/export/gfs/gv_1 > Brick2: gfs-3.san:/export/gfs/gv_1 > Brick3: docker3.san:/export/gfs/gv_1 (arbiter) > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > performance.low-prio-threads: 32 > network.remote-dio: enable > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > > Partial KVM XML dump: > > <disk type='network' device='disk'> > <driver name='qemu' type='raw' cache='none'/> > <source protocol='gluster' > name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'> > <host name='10.0.1.201' port='24007'/> > </source> > <backingStore/> > <target dev='vda' bus='virtio'/> > <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > </disk> > > Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps > SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all > nodes (including arbiter). > > I would really love to know what am I doing wrong, because this is my > experience with Gluster for a long time a and a reason I would not > recommend it as VM storage backend in production environment where you > cannot start/stop VMs on your own (e.g. providing private clouds for > customers). > -ps > > > On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at assyoma.it> > wrote: > > Il 30-08-2017 17:07 Ivan Rossi ha scritto: > >> > >> There has ben a bug associated to sharding that led to VM corruption > >> that has been around for a long time (difficult to reproduce I > >> understood). I have not seen reports on that for some time after the > >> last fix, so hopefully now VM hosting is stable. > > > > > > Mmmm... this is precisely the kind of bug that scares me... data > corruption > > :| > > Any more information on what causes it and how to resolve? Even if in > newer > > Gluster releases it is a solved bug, knowledge on how to treat it would > be > > valuable. > > > > > > Thanks. > > > > -- > > Danti Gionatan > > Supporto Tecnico > > Assyoma S.r.l. - www.assyoma.it > > email: g.danti at assyoma.it - info at assyoma.it > > GPG public key ID: FF5F32A8 > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170906/e637a975/attachment.html>