thr3ads.net - Gluster users - [Gluster-users] GlusterFS as virtual machine storage [Aug 2017]

If this information is useful, please help other people find it:
Share via:

2017-Aug-24 00:13 UTC

[Gluster-users] GlusterFS as virtual machine storage

That really isnt an arbiter issue or for that matter a Gluster issue. We 
have seen that with vanilla NAS servers that had some issue or another.

Arbiter simply makes it less likely to be an issue than replica 2 but in 
turn arbiter is less 'safe' than replica 3.

However, in regards to Gluster and RO behaviour

The default timeout for most OS versions is 30 seconds and the Gluster 
timeout is 42, so yes you can trigger an RO event.

# cat /sys/block/sda/device/timeout
30

Though it is easy enough to raise as Pavel mentioned

# echo 90 > /sys/block/sda/device/timeout

As a purely observational note, we have noticed that EXT3/4 filesystems 
on VMs will go read-only much easier than XFS systems (even with the 
default timeout and irregardless of storage type). We have always 
wondered about that, though part of that observation is biased because 
we tend to use XFS on newer VMs which mean newer, better kernels.

Likewise virtio "disks" don't even have a timeout value that I am
aware
of and I don't recall them being extremely sensitive to disk issues on 
either Gluster, NFS or DAS.

All our newer VMs use virtio instead of sata/ide emulation AND XFS so we 
rarely see a RO situation and if we do, it was a good thing the VMs did 
go RO to protect themselves while the storage system freaked out.






On 8/23/2017 12:26 PM, lemonnierk at ulrar.net wrote:> Really ? I can't see why. But I've never used arbiter so you
probably
> know more about this than I do.
>
> In any case, with replica 3, never had a problem.
>
> On Wed, Aug 23, 2017 at 09:13:28PM +0200, Pavel Szalbot wrote:
>> Hi, I believe it is not that simple. Even replica 2 + arbiter volume
>> with default network.ping-timeout will cause the underlying VM to
>> remount filesystem as read-only (device error will occur) unless you
>> tune mount options in VM's fstab.
>> -ps
>>
>>
>> On Wed, Aug 23, 2017 at 6:59 PM,  <lemonnierk at ulrar.net>
wrote:
>>> What he is saying is that, on a two node volume, upgrading a node
will
>>> cause the volume to go down. That's nothing weird, you really
should use
>>> 3 nodes.
>>>
>>> On Wed, Aug 23, 2017 at 06:51:55PM +0200, Gionatan Danti wrote:
>>>> Il 23-08-2017 18:14 Pavel Szalbot ha scritto:
>>>>> Hi, after many VM crashes during upgrades of Gluster,
losing network
>>>>> connectivity on one node etc. I would advise running
replica 2 with
>>>>> arbiter.
>>>> Hi Pavel, this is bad news :(
>>>> So, in your case at least, Gluster was not stable? Something as
simple
>>>> as an update would let it crash?
>>>>
>>>>> I once even managed to break this setup (with arbiter) due
to network
>>>>> partitioning - one data node never healed and I had to
restore from
>>>>> backups (it was easier and kind of non-production). Be
extremely
>>>>> careful and plan for failure.
>>>> I would use VM locking via sanlock or virtlock, so a split
brain should
>>>> not cause simultaneous changes on both replicas. I am more
concerned
>>>> about volume heal time: what will happen if the standby node
>>>> crashes/reboots? Will *all* data be re-synced from the master,
or only
>>>> changed bit will be re-synced? As stated above, I would like to
avoid
>>>> using sharding...
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> --
>>>> Danti Gionatan
>>>> Supporto Tecnico
>>>> Assyoma S.r.l. - www.assyoma.it
>>>> email: g.danti at assyoma.it - info at assyoma.it
>>>> GPG public key ID: FF5F32A8
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170823/31980700/attachment.html>

Pavel Szalbot

2017-Aug-24 05:44 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Hi,

On Thu, Aug 24, 2017 at 2:13 AM, WK <wkmail at bneit.com>
wrote:> The default timeout for most OS versions is 30 seconds and the Gluster
> timeout is 42, so yes you can trigger an RO event.
I get read-only mount within approximately 2 seconds after failed IO.
> Though it is easy enough to raise as Pavel mentioned
>
> # echo 90 > /sys/block/sda/device/timeout
AFAIK this is applicable only for directly attached block devices
(non-virtualized).
> Likewise virtio "disks" don't even have a timeout value that
I am aware of
> and I don't recall them being extremely sensitive to disk issues on
either
> Gluster, NFS or DAS.
We use only virtio and these problems are persistent - temporarily
suspending a node (e.g. HW or Gluster upgrade, reboot) is very scary,
because we often end up with read-only filesystems on all VMs.

However we use ext4, so I cannot comment on XFS.

This discussion will probably end before I migrate VMs from Gluster to
local storage on our Openstack nodes, but I might run some tests
afterwards and keep you posted.

-ps

2017-Aug-24 20:20 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

On 8/23/2017 10:44 PM, Pavel Szalbot wrote:> Hi,
>
> On Thu, Aug 24, 2017 at 2:13 AM, WK <wkmail at bneit.com> wrote:
>> The default timeout for most OS versions is 30 seconds and the Gluster
>> timeout is 42, so yes you can trigger an RO event.
> I get read-only mount within approximately 2 seconds after failed IO.
Hmm, we don't see that, even on busy VMs.
We ARE using QCOW2 disk images though.

Also, though we no longer use Ovirt, I am still on the list. They are 
heavy Gluster users and they would be howling if they all had your 
experience.
>
>> Though it is easy enough to raise as Pavel mentioned
>>
>> # echo 90 > /sys/block/sda/device/timeout
> AFAIK this is applicable only for directly attached block devices
> (non-virtualized).
No, if you use SATA/IDE emulation (NOT virtio) it is there WITHIN the VM.
We have a lot of legacy VMs from older projects/workloads that have that 
and we haven't bothered changing them because "they are working fine
now"
It is NOT there on virtio.

>> Likewise virtio "disks" don't even have a timeout value
that I am aware of
>> and I don't recall them being extremely sensitive to disk issues on
either
>> Gluster, NFS or DAS.
> We use only virtio and these problems are persistent - temporarily
> suspending a node (e.g. HW or Gluster upgrade, reboot) is very scary,
> because we often end up with read-only filesystems on all VMs.
>
> However we use ext4, so I cannot comment on XFS.
We use the fuse mount, because we are lazy and haven't upgraded to 
libgfapi.? I hope to start a new cluster with to libfgapi shortly 
because of the better performance.
Also we use a localhost mount for the gluster driveset on each compute 
node (i.e. so called hyperconverged). So the only 'gluster' only kit is 
the lightweight arb box.
So those VMs in the gluster 'pool' have a local write and then only 1 
off-server write (to the other gluster enabled compute host), which 
means pretty good performance.

We use the gluster included 'virt' tuning set of:

performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.stat-prefetch=off
performance.low-prio-threads=32
network.remote-dio=enable
cluster.eager-lock=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.data-self-heal-algorithm=full
cluster.locking-scheme=granular
cluster.shd-max-threads=8
cluster.shd-wait-qlength=10000
features.shard=on
user.cifs=off

We do play with shard size and have settled down on 64M, though I've 
seen recommendations of 128M and 512M for VMs.
We didn't really notice much of a difference with any of those as long 
as they were at least 64M
>
> This discussion will probably end before I migrate VMs from Gluster to
> local storage on our Openstack nodes, but I might run some tests
> afterwards and keep you posted.
I would be interested in your results. You may also look into Ceph. It 
is more complicated than Gluster, (well, more complicated than our 
simple little Gluster arrangement) but the OpenStack people swear by it.
It wasn't suited to our needs, but it tested well, when we looked into 
it last year.

Seemingly Similar Threads

Search for more maybe matching threads

Gluster users - Aug 2017 - GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

Seemingly Similar Threads