thr3ads.net - Gluster users - [Gluster-users] GlusterFS as virtual machine storage [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Pavel Szalbot

2017-Sep-08 12:11 UTC

[Gluster-users] GlusterFS as virtual machine storage

Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
minutes. SIGTERM on the other hand causes crash, but this time it is
not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
-ps


On Fri, Sep 8, 2017 at 1:56 PM, Diego Remolina <dijuremo at gmail.com>
wrote:> I currently only have a Windows 2012 R2 server VM in testing on top of
> the gluster storage, so I will have to take some time to provision a
> couple Linux VMs with both ext4 and XFS to see what happens on those.
>
> The Windows server VM is OK with killall glusterfsd, but when the 42
> second timeout goes into effect, it gets paused and I have to go into
> RHEVM to un-pause it.
>
> Diego
>
> On Fri, Sep 8, 2017 at 7:53 AM, Gandalf Corvotempesta
> <gandalf.corvotempesta at gmail.com> wrote:
>> 2017-09-08 13:44 GMT+02:00 Pavel Szalbot <pavel.szalbot at
gmail.com>:
>>> I did not test SIGKILL because I suppose if graceful exit is bad,
SIGKILL
>>> will be as well. This assumption might be wrong. So I will test it.
It would
>>> be interesting to see client to work in case of crash (SIGKILL) and
not in
>>> case of graceful exit of glusterfsd.
>>
>> Exactly. if this happen, probably there is a bug in gluster's
signal management.

Gandalf Corvotempesta

2017-Sep-08 12:13 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot at
gmail.com>:> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
> minutes. SIGTERM on the other hand causes crash, but this time it is
> not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
> -ps
So, seems to be reliable to server crashes but not to server shutdown :)

Pavel Szalbot

2017-Sep-08 12:13 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Btw after few more seconds in SIGTERM scenario, VM kind of revived and
seems to be fine... And after few more restarts of fio job, I got I/O
error.
-ps


On Fri, Sep 8, 2017 at 2:11 PM, Pavel Szalbot <pavel.szalbot at gmail.com>
wrote:> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
> minutes. SIGTERM on the other hand causes crash, but this time it is
> not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
> -ps
>
>
> On Fri, Sep 8, 2017 at 1:56 PM, Diego Remolina <dijuremo at
gmail.com> wrote:
>> I currently only have a Windows 2012 R2 server VM in testing on top of
>> the gluster storage, so I will have to take some time to provision a
>> couple Linux VMs with both ext4 and XFS to see what happens on those.
>>
>> The Windows server VM is OK with killall glusterfsd, but when the 42
>> second timeout goes into effect, it gets paused and I have to go into
>> RHEVM to un-pause it.
>>
>> Diego
>>
>> On Fri, Sep 8, 2017 at 7:53 AM, Gandalf Corvotempesta
>> <gandalf.corvotempesta at gmail.com> wrote:
>>> 2017-09-08 13:44 GMT+02:00 Pavel Szalbot <pavel.szalbot at
gmail.com>:
>>>> I did not test SIGKILL because I suppose if graceful exit is
bad, SIGKILL
>>>> will be as well. This assumption might be wrong. So I will test
it. It would
>>>> be interesting to see client to work in case of crash (SIGKILL)
and not in
>>>> case of graceful exit of glusterfsd.
>>>
>>> Exactly. if this happen, probably there is a bug in gluster's
signal management.

Pavel Szalbot

2017-Sep-08 12:14 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Well I really do not like the non-deterministic characteristic of it.
However the server crash did never occur in my production environment
- only upgrades and reboots ;-)
-ps


On Fri, Sep 8, 2017 at 2:13 PM, Gandalf Corvotempesta
<gandalf.corvotempesta at gmail.com> wrote:> 2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot at
gmail.com>:
>> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
>> minutes. SIGTERM on the other hand causes crash, but this time it is
>> not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
>> -ps
>
> So, seems to be reliable to server crashes but not to server shutdown :)

2017-Sep-09 00:35 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Pavel.

Is there a difference between native client (fuse) and libgfapi in 
regards to the crashing/read-only behaviour?

We use Rep2 + Arb and can shutdown a node cleanly, without issue on our 
VMs. We do it all the time for upgrades and maintenance.

However we are still on native client as we haven't had time to work on 
libgfapi yet. Maybe that is more tolerant.

We have linux VMs mostly with XFS filesystems.

During the downtime, the VMs continue to run with normal speed.

In this case we migrated to the VM so date node 2 (c2g.gluster) and 
shutdown c1g.gluster to do some upgrades.

# gluster peer status
Number of Peers: 2

Hostname: c1g.gluster
Uuid: 91be2005-30e6-462b-a66e-773913cacab6
State: Peer in Cluster (Disconnected)
Hostname: arb-c2.gluster
Uuid: 20862755-e54e-4b79-96a8-59e78c6a6a2e
State: Peer in Cluster (Connected)

# gluster volume status
Status of volume: brick1
Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid
------------------------------------------------------------------------------
Brick c2g.gluster:/GLUSTER/brick1?????? 49152???? 0 Y?????? 5194
Brick arb-c2.gluster:/GLUSTER/brick1??????? 49152???? 0 Y?????? 3647
Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 5214
Self-heal Daemon on arb-c2.gluster????????? N/A?????? N/A Y?????? 3667

Task Status of Volume brick1
------------------------------------------------------------------------------
There are no active volume tasks

When we return the c1g node, we do see a "pause" in the VMs as the 
shards heal. By pause meaning a terminal session gets spongy, but that 
passes pretty quickly.

Also are your VMs mounted in libvirt with caching? We always use 
cache='none' so we can migrate around easily.

Finally, you seem to be using oVirt/RHEV. Is it possible that your platform is
triggering a protective response on the VMs (by suspending).

-wk

On 9/8/2017 5:13 AM, Gandalf Corvotempesta wrote:> 2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot at
gmail.com>:
>> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
>> minutes. SIGTERM on the other hand causes crash, but this time it is
>> not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
>> -ps
> So, seems to be reliable to server crashes but not to server shutdown :)
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Apparently Analagous Threads

Search for more reasonably related threads

Gluster users - Sep 2017 - GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

Apparently Analagous Threads