thr3ads.net - Gluster users - [Gluster-users] Fail of one brick lead to crash VMs [Feb 2016]

If this information is useful, please help other people find it:
Share via:

FNU Raghavendra Manjunath

2016-Feb-08 15:40 UTC

[Gluster-users] Fail of one brick lead to crash VMs

+ Pranith

In the meantime, can you please provide the logs of all the gluster server
machines  and the client machines?

Logs can be found in /var/log/glusterfs directory.

Regards,
Raghavendra

On Mon, Feb 8, 2016 at 9:20 AM, Dominique Roux <dominique.roux at
ungleich.ch>
wrote:
> Hi guys,
>
> I faced a problem a week ago.
> In our environment we have three servers in a quorum. The gluster volume
> is spreaded over two bricks and has the type replicated.
>
> We now, for simulating a fail of one brick, isolated one of the two
> bricks with iptables, so that communication to the other two peers
> wasn't possible anymore.
> After that VMs (opennebula) which had I/O in this time crashed.
> We stopped the glusterfsd hard (kill -9) and restarted it, what made
> things work again (Certainly we also had to restart the failed VMs). But
> I think this shouldn't happen. Since quorum was not reached (2/3 hosts
> were still up and connected).
>
> Here some infos of our system:
> OS: CentOS Linux release 7.1.1503
> Glusterfs version: glusterfs 3.7.3
>
> gluster volume info:
>
> Volume Name: cluster1
> Type: Replicate
> Volume ID:
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: srv01:/home/gluster
> Brick2: srv02:/home/gluster
> Options Reconfigured:
> cluster.self-heal-daemon: enable
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: on
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> server.allow-insecure: on
> nfs.disable: 1
>
> Hope you can help us.
>
> Thanks a lot.
>
> Best regards
> Dominique
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160208/3513c539/attachment.html>

Dominique Roux

2016-Feb-09 16:45 UTC

head link

[Gluster-users] Fail of one brick lead to crash VMs

Logs are attached

For claryfication:
vmhost1-cluster1 -> Brick 1
vmhost2-cluster2 -> Brick 2
entrance -> Peer

Time of testing (31.01.2016 16:13)

Thanks for your help

Regards,
Dominique


Werde Teil des modernen Arbeitens im Glarnerland auf www.digitalglarus.ch!
Lese Neuigkeiten auf Twitter: www.twitter.com/DigitalGlarus
Diskutiere mit auf Facebook:  www.facebook.com/digitalglarus

On 02/08/2016 04:40 PM, FNU Raghavendra Manjunath wrote:> + Pranith
> 
> In the meantime, can you please provide the logs of all the gluster
> server machines  and the client machines?
> 
> Logs can be found in /var/log/glusterfs directory.
> 
> Regards,
> Raghavendra
> 
> On Mon, Feb 8, 2016 at 9:20 AM, Dominique Roux
> <dominique.roux at ungleich.ch <mailto:dominique.roux at
ungleich.ch>> wrote:
> 
>     Hi guys,
> 
>     I faced a problem a week ago.
>     In our environment we have three servers in a quorum. The gluster
volume
>     is spreaded over two bricks and has the type replicated.
> 
>     We now, for simulating a fail of one brick, isolated one of the two
>     bricks with iptables, so that communication to the other two peers
>     wasn't possible anymore.
>     After that VMs (opennebula) which had I/O in this time crashed.
>     We stopped the glusterfsd hard (kill -9) and restarted it, what made
>     things work again (Certainly we also had to restart the failed VMs).
But
>     I think this shouldn't happen. Since quorum was not reached (2/3
hosts
>     were still up and connected).
> 
>     Here some infos of our system:
>     OS: CentOS Linux release 7.1.1503
>     Glusterfs version: glusterfs 3.7.3
> 
>     gluster volume info:
> 
>     Volume Name: cluster1
>     Type: Replicate
>     Volume ID:
>     Status: Started
>     Number of Bricks: 1 x 2 = 2
>     Transport-type: tcp
>     Bricks:
>     Brick1: srv01:/home/gluster
>     Brick2: srv02:/home/gluster
>     Options Reconfigured:
>     cluster.self-heal-daemon: enable
>     cluster.server-quorum-type: server
>     network.remote-dio: enable
>     cluster.eager-lock: enable
>     performance.stat-prefetch: on
>     performance.io-cache: off
>     performance.read-ahead: off
>     performance.quick-read: off
>     server.allow-insecure: on
>     nfs.disable: 1
> 
>     Hope you can help us.
> 
>     Thanks a lot.
> 
>     Best regards
>     Dominique
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     http://www.gluster.org/mailman/listinfo/gluster-users
> 
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: logs_glusterlist.tar.xz
Type: application/x-xz
Size: 17508 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160209/02c97dfc/attachment.xz>

Gluster users - Feb 2016 - Fail of one brick lead to crash VMs

[Gluster-users] Fail of one brick lead to crash VMs

[Gluster-users] Fail of one brick lead to crash VMs