Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few minutes. SIGTERM on the other hand causes crash, but this time it is not read-only remount, but around 10 IOPS tops and 2 IOPS on average. -ps On Fri, Sep 8, 2017 at 1:56 PM, Diego Remolina <dijuremo at gmail.com> wrote:> I currently only have a Windows 2012 R2 server VM in testing on top of > the gluster storage, so I will have to take some time to provision a > couple Linux VMs with both ext4 and XFS to see what happens on those. > > The Windows server VM is OK with killall glusterfsd, but when the 42 > second timeout goes into effect, it gets paused and I have to go into > RHEVM to un-pause it. > > Diego > > On Fri, Sep 8, 2017 at 7:53 AM, Gandalf Corvotempesta > <gandalf.corvotempesta at gmail.com> wrote: >> 2017-09-08 13:44 GMT+02:00 Pavel Szalbot <pavel.szalbot at gmail.com>: >>> I did not test SIGKILL because I suppose if graceful exit is bad, SIGKILL >>> will be as well. This assumption might be wrong. So I will test it. It would >>> be interesting to see client to work in case of crash (SIGKILL) and not in >>> case of graceful exit of glusterfsd. >> >> Exactly. if this happen, probably there is a bug in gluster's signal management.
Gandalf Corvotempesta
2017-Sep-08 12:13 UTC
[Gluster-users] GlusterFS as virtual machine storage
2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot at gmail.com>:> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few > minutes. SIGTERM on the other hand causes crash, but this time it is > not read-only remount, but around 10 IOPS tops and 2 IOPS on average. > -psSo, seems to be reliable to server crashes but not to server shutdown :)
Btw after few more seconds in SIGTERM scenario, VM kind of revived and seems to be fine... And after few more restarts of fio job, I got I/O error. -ps On Fri, Sep 8, 2017 at 2:11 PM, Pavel Szalbot <pavel.szalbot at gmail.com> wrote:> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few > minutes. SIGTERM on the other hand causes crash, but this time it is > not read-only remount, but around 10 IOPS tops and 2 IOPS on average. > -ps > > > On Fri, Sep 8, 2017 at 1:56 PM, Diego Remolina <dijuremo at gmail.com> wrote: >> I currently only have a Windows 2012 R2 server VM in testing on top of >> the gluster storage, so I will have to take some time to provision a >> couple Linux VMs with both ext4 and XFS to see what happens on those. >> >> The Windows server VM is OK with killall glusterfsd, but when the 42 >> second timeout goes into effect, it gets paused and I have to go into >> RHEVM to un-pause it. >> >> Diego >> >> On Fri, Sep 8, 2017 at 7:53 AM, Gandalf Corvotempesta >> <gandalf.corvotempesta at gmail.com> wrote: >>> 2017-09-08 13:44 GMT+02:00 Pavel Szalbot <pavel.szalbot at gmail.com>: >>>> I did not test SIGKILL because I suppose if graceful exit is bad, SIGKILL >>>> will be as well. This assumption might be wrong. So I will test it. It would >>>> be interesting to see client to work in case of crash (SIGKILL) and not in >>>> case of graceful exit of glusterfsd. >>> >>> Exactly. if this happen, probably there is a bug in gluster's signal management.
Well I really do not like the non-deterministic characteristic of it. However the server crash did never occur in my production environment - only upgrades and reboots ;-) -ps On Fri, Sep 8, 2017 at 2:13 PM, Gandalf Corvotempesta <gandalf.corvotempesta at gmail.com> wrote:> 2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot at gmail.com>: >> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few >> minutes. SIGTERM on the other hand causes crash, but this time it is >> not read-only remount, but around 10 IOPS tops and 2 IOPS on average. >> -ps > > So, seems to be reliable to server crashes but not to server shutdown :)
Pavel. Is there a difference between native client (fuse) and libgfapi in regards to the crashing/read-only behaviour? We use Rep2 + Arb and can shutdown a node cleanly, without issue on our VMs. We do it all the time for upgrades and maintenance. However we are still on native client as we haven't had time to work on libgfapi yet. Maybe that is more tolerant. We have linux VMs mostly with XFS filesystems. During the downtime, the VMs continue to run with normal speed. In this case we migrated to the VM so date node 2 (c2g.gluster) and shutdown c1g.gluster to do some upgrades. # gluster peer status Number of Peers: 2 Hostname: c1g.gluster Uuid: 91be2005-30e6-462b-a66e-773913cacab6 State: Peer in Cluster (Disconnected) Hostname: arb-c2.gluster Uuid: 20862755-e54e-4b79-96a8-59e78c6a6a2e State: Peer in Cluster (Connected) # gluster volume status Status of volume: brick1 Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid ------------------------------------------------------------------------------ Brick c2g.gluster:/GLUSTER/brick1?????? 49152???? 0 Y?????? 5194 Brick arb-c2.gluster:/GLUSTER/brick1??????? 49152???? 0 Y?????? 3647 Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 5214 Self-heal Daemon on arb-c2.gluster????????? N/A?????? N/A Y?????? 3667 Task Status of Volume brick1 ------------------------------------------------------------------------------ There are no active volume tasks When we return the c1g node, we do see a "pause" in the VMs as the shards heal. By pause meaning a terminal session gets spongy, but that passes pretty quickly. Also are your VMs mounted in libvirt with caching? We always use cache='none' so we can migrate around easily. Finally, you seem to be using oVirt/RHEV. Is it possible that your platform is triggering a protective response on the VMs (by suspending). -wk On 9/8/2017 5:13 AM, Gandalf Corvotempesta wrote:> 2017-09-08 14:11 GMT+02:00 Pavel Szalbot <pavel.szalbot at gmail.com>: >> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few >> minutes. SIGTERM on the other hand causes crash, but this time it is >> not read-only remount, but around 10 IOPS tops and 2 IOPS on average. >> -ps > So, seems to be reliable to server crashes but not to server shutdown :) > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users