Darren Austin
2011-Jun-29 10:07 UTC
[Gluster-users] Fwd: Unexpected behaviour during replication heal
Copy for the ML too. Any chance we can have a reply-to header set? :) Darren. ----- Forwarded Message ----- From: "Darren Austin" <darren-lists at widgit.com> To: "Marco Agostini" <comunelevico at gmail.com> Sent: Wednesday, 29 June, 2011 11:06:28 AM Subject: Re: [Gluster-users] Unexpected behaviour during replication heal ----- Original Message -----> from your mnt.log I've seen that you are using GlusterFS-3.1.0 on your > server.They were all running 3.2.0 until yesterday, where I upgraded to 3.2.1.> Actualy I'm using Gluster 3.2.1 and works very fine. > I'm making test similar to your test: detach cable of a GlusterFS > server while more client are writing.This is a slightly different situation to what i'm testing - unplugging the cable is different from firewalling off the other server and client. With the cable unplugged, the switch can detect that the host is unreachable and return the correct ICMP response to the second server and client - which is probably why things work correctly for you. When I use iptables -j DROP to simulate an entire EC2 availability zone becoming unreachable (as MIGHT happen with Amazon's EC2 set up), it's unlikely that ICMP response codes will be returned to the server in the still working availability zone, or the client(s). Hope that makes sense :) Darren. -- Darren Austin - Systems Administrator, Widgit Software. Tel: +44 (0)1926 333680. Web: http://www.widgit.com/ 26 Queen Street, Cubbington, Warwickshire, CV32 7NA. -- Darren Austin - Systems Administrator, Widgit Software. Tel: +44 (0)1926 333680. Web: http://www.widgit.com/ 26 Queen Street, Cubbington, Warwickshire, CV32 7NA.
Darren Austin
2011-Jun-29 10:14 UTC
[Gluster-users] Fwd: Unexpected behaviour during replication heal
----- Forwarded Message ----- From: "Darren Austin" <darren-lists at widgit.com> To: "Mohit Anchlia" <mohitanchlia at gmail.com> Sent: Wednesday, 29 June, 2011 11:13:30 AM Subject: Re: [Gluster-users] Unexpected behaviour during replication heal ----- Original Message -----> Did you recently upgrade?I was able to reproduce this problem on both 3.2.0 and 3.2.1. It wasn't an upgrade situation - I deleted the volumes and re-created them for each test.> Can you also post gluster volume info and your gluster vol files?'gluster volume info': Volume Name: data-volume Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.234.158.226:/data Brick2: 10.49.14.115:/data glusterd.vol (same on both servers): volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume Other vol files and brick info were posted with my first description of the issue :) HTH, Darren. -- Darren Austin - Systems Administrator, Widgit Software. Tel: +44 (0)1926 333680. Web: http://www.widgit.com/ 26 Queen Street, Cubbington, Warwickshire, CV32 7NA. -- Darren Austin - Systems Administrator, Widgit Software. Tel: +44 (0)1926 333680. Web: http://www.widgit.com/ 26 Queen Street, Cubbington, Warwickshire, CV32 7NA.
Darren Austin
2011-Jun-29 10:43 UTC
[Gluster-users] Fwd: Unexpected behaviour during replication heal
Another one that went to someone personally, rather than the list - sorry about that :) Darren. ----- Forwarded Message ----- From: "Darren Austin" <darren-lists at widgit.com> To: "Mohit Anchlia" <mohitanchlia at gmail.com> Sent: Wednesday, 29 June, 2011 11:13:30 AM Subject: Re: [Gluster-users] Unexpected behaviour during replication heal ----- Original Message -----> Did you recently upgrade?I was able to reproduce this problem on both 3.2.0 and 3.2.1. It wasn't an upgrade situation - I deleted the volumes and re-created them for each test.> Can you also post gluster volume info and your gluster vol files?'gluster volume info': Volume Name: data-volume Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.234.158.226:/data Brick2: 10.49.14.115:/data glusterd.vol (same on both servers): volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume Other vol files and brick info were posted with my first description of the issue :) HTH, Darren. -- Darren Austin - Systems Administrator, Widgit Software. Tel: +44 (0)1926 333680. Web: http://www.widgit.com/ 26 Queen Street, Cubbington, Warwickshire, CV32 7NA. -- Darren Austin - Systems Administrator, Widgit Software. Tel: +44 (0)1926 333680. Web: http://www.widgit.com/ 26 Queen Street, Cubbington, Warwickshire, CV32 7NA.