thr3ads.net - Gluster users - [Gluster-users] libgfapi failover problem on replica bricks [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Paul Penev

2014-Apr-06 15:52 UTC

[Gluster-users] libgfapi failover problem on replica bricks

Hello,

I'm having an issue with rebooting bricks holding images for live KVM
machines (using libgfapi).

I have a replicated+distributed setup of 4 bricks (2x2). The cluster
contains images for a couple of kvm virtual machines.

My problem is that when I reboot a brick containing a an image of a
VM, the VM will start throwing disk errors and eventually die.

The gluster volume is made like this:

# gluster vol info pool

Volume Name: pool
Type: Distributed-Replicate
Volume ID: xxxxxxxxxxxxxxxxxxxx
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: srv10g:/data/gluster/brick
Brick2: srv11g:/data/gluster/brick
Brick3: srv12g:/data/gluster/brick
Brick4: srv13g:/data/gluster/brick
Options Reconfigured:
network.ping-timeout: 10
cluster.server-quorum-type: server
diagnostics.client-log-level: WARNING
auth.allow: 192.168.0.*,127.*
nfs.disable: on

The KVM instances run on the same gluster bricks, with disks mounted
as :
file=gluster://localhost/pool/images/vm-xxx-disk-1.raw,.......,cache=writethrough,aio=native

My self-heal backlog is not always 0. It looks like some writes are
not going to all bricks at the same time (?).

gluster vol heal pool info

sometime shows the images needing sync on one brick, the other or both.

There are no network problems or errors on the wire.

Any ideas what could be causing this ?

Thanks.

Fabio Rosati

2014-Apr-09 08:19 UTC

head link

[Gluster-users] libgfapi failover problem on replica bricks

Hi Paul,

you're not alone. I get the same issue after rebooting a brick belonging to
a 2 x 2 volume and the same is true for Jo?o P. and Nick M. (added in cc).

[root at networker ~]# gluster volume info gv_pri
 
Volume Name: gv_pri
Type: Distributed-Replicate
Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: nw1glus.gem.local:/glustexp/pri1/brick
Brick2: nw2glus.gem.local:/glustexp/pri1/brick
Brick3: nw3glus.gem.local:/glustexp/pri2/brick
Brick4: nw4glus.gem.local:/glustexp/pri2/brick
Options Reconfigured:
storage.owner-gid: 107
storage.owner-uid: 107
server.allow-insecure: on
network.remote-dio: on
performance.write-behind-window-size: 16MB
performance.cache-size: 128MB


I hope someone will address this problem in the near future since not being able
to shutdown a server hosting a brick is a big limitation.
It seems someone solved the problem using cgroups:
http://www.gluster.org/author/andrew-lau/
Anyway, I think it's not easy to implement because cgroups is already
configured and in use for libvirt, if I had a test environment and some spare
time I would have tried.


Regards,
Fabio Rosati 


----- Messaggio originale -----
Da: "Paul Penev" <ppquant at gmail.com>
A: Gluster-users at gluster.org
Inviato: Domenica, 6 aprile 2014 17:52:53
Oggetto: [Gluster-users] libgfapi failover problem on replica bricks

Hello,

I'm having an issue with rebooting bricks holding images for live KVM
machines (using libgfapi).

I have a replicated+distributed setup of 4 bricks (2x2). The cluster
contains images for a couple of kvm virtual machines.

My problem is that when I reboot a brick containing a an image of a
VM, the VM will start throwing disk errors and eventually die.

The gluster volume is made like this:

# gluster vol info pool

Volume Name: pool
Type: Distributed-Replicate
Volume ID: xxxxxxxxxxxxxxxxxxxx
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: srv10g:/data/gluster/brick
Brick2: srv11g:/data/gluster/brick
Brick3: srv12g:/data/gluster/brick
Brick4: srv13g:/data/gluster/brick
Options Reconfigured:
network.ping-timeout: 10
cluster.server-quorum-type: server
diagnostics.client-log-level: WARNING
auth.allow: 192.168.0.*,127.*
nfs.disable: on

The KVM instances run on the same gluster bricks, with disks mounted
as :
file=gluster://localhost/pool/images/vm-xxx-disk-1.raw,.......,cache=writethrough,aio=native

My self-heal backlog is not always 0. It looks like some writes are
not going to all bricks at the same time (?).

gluster vol heal pool info

sometime shows the images needing sync on one brick, the other or both.

There are no network problems or errors on the wire.

Any ideas what could be causing this ?

Thanks.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Paul Penev

2014-Apr-16 16:20 UTC

head link

[Gluster-users] libgfapi failover problem on replica bricks

>>I can easily reproduce the problem on this cluster. It appears that
>>there is a "primary" replica and a "secondary"
replica.
>>
>>If I reboot or kill the glusterfs process there is no problems on the
>>running VM.
>
> Good. That is as expected.
Sorry, I was not clear enough. I meant that if I reboot the
"secondary" replica, there are no problems.
>>If I reboot or "killall -KILL glusterfsd" the primary replica
(so I
>>don't let it terminate properly), I can block the the VM each time.
>
> Have you followed my blog advise to prevent the vm from remounting the
image filesystem read-only and waited ping-timeout seconds (42 by default)?
I have not followed your advice, but there is a difference: I get i/o
errors *reading* from the disk. Once the problem kicks, I cannot issue
commands (like ls) because they can't be read.

There is a problem with that setup: It cannot be implemented on
windows machines (which are move vulnerable) and also cannot be
implemented on machines which I have no control on (customers).
>>If I "reset" the VM it will not find the boot disk.
>
> Somewhat expected if within the ping-timeout.
The issue persists beyond the ping-timeout. The KVM process needs to
be reinitialized. I guess libgfapi needs to reconnect from scratch.
>>If I power down and power up the VM, then it will boot but will find
>>corruption on disk during the boot that requires fixing.
>
> Expected since the vm doesn't use the image filesystem synchronously.
You can change that with mount options at the cost of performance.
Ok. I understand this point.
> Unless you wait for ping-timeout and then continue writing the replica is
actually still in sync. It's only out of sync if you write to one replica
but not the other.
>
> You can shorten the ping timeout. There is a cost to reconnection if you
do.  Be sure to test a scenario with servers under production loads and see what
the performance degradation during a reconnect is. Balance your needs
appropriately.
Could you please elaborate on the cost of reconnection? I will try to
run with a very short ping timeout (2sec) and see if the problem is in
the ping-timeout or perhaps not.


Paul

2014-04-06 17:52 GMT+02:00 Paul Penev <ppquant at
gmail.com>:> Hello,
>
> I'm having an issue with rebooting bricks holding images for live KVM
> machines (using libgfapi).
>
> I have a replicated+distributed setup of 4 bricks (2x2). The cluster
> contains images for a couple of kvm virtual machines.
>
> My problem is that when I reboot a brick containing a an image of a
> VM, the VM will start throwing disk errors and eventually die.
>
> The gluster volume is made like this:
>
> # gluster vol info pool
>
> Volume Name: pool
> Type: Distributed-Replicate
> Volume ID: xxxxxxxxxxxxxxxxxxxx
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: srv10g:/data/gluster/brick
> Brick2: srv11g:/data/gluster/brick
> Brick3: srv12g:/data/gluster/brick
> Brick4: srv13g:/data/gluster/brick
> Options Reconfigured:
> network.ping-timeout: 10
> cluster.server-quorum-type: server
> diagnostics.client-log-level: WARNING
> auth.allow: 192.168.0.*,127.*
> nfs.disable: on
>
> The KVM instances run on the same gluster bricks, with disks mounted
> as :
file=gluster://localhost/pool/images/vm-xxx-disk-1.raw,.......,cache=writethrough,aio=native
>
> My self-heal backlog is not always 0. It looks like some writes are
> not going to all bricks at the same time (?).
>
> gluster vol heal pool info
>
> sometime shows the images needing sync on one brick, the other or both.
>
> There are no network problems or errors on the wire.
>
> Any ideas what could be causing this ?
>
> Thanks.

Apparently Analagous Threads

Search for more maybe matching threads

Gluster users - Apr 2014 - libgfapi failover problem on replica bricks

[Gluster-users] libgfapi failover problem on replica bricks

[Gluster-users] libgfapi failover problem on replica bricks

[Gluster-users] libgfapi failover problem on replica bricks

Apparently Analagous Threads