thr3ads.net - Gluster users - [Gluster-users] self-heal stops some vms (virtual machines) [Feb 2014]

If this information is useful, please help other people find it:
Share via:

João Pagaime

2014-Feb-07 12:13 UTC

[Gluster-users] self-heal stops some vms (virtual machines)

hello all

I have a replicate volume that holds kvm  vms (virtual machines)

I had to stop one gluster-server for maintenance . That part of the 
operation went well: no vms problems after shutdown

the problems started after booting the gluster-server. Self-healing 
started as expected, but some vms  locked up with disk problems 
(time-outs), as self-healing goes by them.
Some VMs did survive the self-healing . I suppose the ones with low IO 
activity or less sensitive to disk problems

is there some specific gluster configuration to enable a self-healing 
ride-through on running-vms? (cluster.data-self-heal-algorithm is 
already on the diff mode)

is there some tweaks recommended to do on vms running on top of gluster?

current config:

gluster:   3.3.0-1.el6.x86_64

--------------------- volume:
# gluster volume info VOL

Volume Name: VOL
Type: Distributed-Replicate
Volume ID: f44182d9-24eb-4953-9cdd-71464f9517e0
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: one-gluster01:/san02-v2
Brick2: one-gluster02:/san02-v2
Brick3: one-gluster01:/san03
Brick4: one-gluster02:/san04
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: on
auth.allow:x
performance.flush-behind: off
cluster.self-heal-window-size: 1
performance.cache-size: 67108864
cluster.data-self-heal-algorithm: diff
performance.io-thread-count: 32
cluster.min-free-disk: 250GB

thanks,
best regards,
joao

Fabio Rosati

2014-Feb-27 09:08 UTC

head link

[Gluster-users] self-heal stops some vms (virtual machines)

Hi All,

I run in exactly the same problem encountered by Joao.
After rebooting one of the GlusterFS nodes, self-heal starts and some VMs
can't access their disk images anymore.

Logs from one of the VMs after one gluster node has rebooted:

Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-2): __ext4_get_inode_loc:
unable to read inode block - inode=2145, block=417
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 15032608
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 15307504
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 15307552
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 15307568
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 15307504
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 12972672
Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-1): ext4_find_entry:
reading directory #123 offset 0
Feb 25 23:35:47 fwrt2 kernel: Core dump to |/usr/libexec/abrt-hook-ccpp 7 0 2757
0 23 1393367747 e pipe failed
Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector 9250632
Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30536)
Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30544)
[...]


I few hours later the VM seemed to be freezed and I had to kill and restart it,
no more problems after reboot.

This is the volume layout:

Volume Name: gv_pri
Type: Distributed-Replicate
Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: nw1glus.gem.local:/glustexp/pri1/brick
Brick2: nw2glus.gem.local:/glustexp/pri1/brick
Brick3: nw3glus.gem.local:/glustexp/pri2/brick
Brick4: nw4glus.gem.local:/glustexp/pri2/brick
Options Reconfigured:
storage.owner-gid: 107
storage.owner-uid: 107
server.allow-insecure: on
network.remote-dio: on
performance.write-behind-window-size: 16MB
performance.cache-size: 128MB

OS: CentOS 6.5
GlusterFS version: 3.4.2

The qemu-kvm VMs access their qcow2 disk images using the native Gluster support
(no fuse mount).
In the Gluster logs I didn't find anything special logged during self-heal
but I can post them if needed.

Anyone have an idea of what can cause these problems?

Thank you
Fabio


----- Messaggio originale -----
Da: "Jo?o Pagaime" <joao.pagaime at gmail.com>
A: Gluster-users at gluster.org
Inviato: Venerd?, 7 febbraio 2014 13:13:59
Oggetto: [Gluster-users] self-heal stops some vms (virtual machines)

hello all

I have a replicate volume that holds kvm  vms (virtual machines)

I had to stop one gluster-server for maintenance . That part of the 
operation went well: no vms problems after shutdown

the problems started after booting the gluster-server. Self-healing 
started as expected, but some vms  locked up with disk problems 
(time-outs), as self-healing goes by them.
Some VMs did survive the self-healing . I suppose the ones with low IO 
activity or less sensitive to disk problems

is there some specific gluster configuration to enable a self-healing 
ride-through on running-vms? (cluster.data-self-heal-algorithm is 
already on the diff mode)

is there some tweaks recommended to do on vms running on top of gluster?

current config:

gluster:   3.3.0-1.el6.x86_64

--------------------- volume:
# gluster volume info VOL

Volume Name: VOL
Type: Distributed-Replicate
Volume ID: f44182d9-24eb-4953-9cdd-71464f9517e0
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: one-gluster01:/san02-v2
Brick2: one-gluster02:/san02-v2
Brick3: one-gluster01:/san03
Brick4: one-gluster02:/san04
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: on
auth.allow:x
performance.flush-behind: off
cluster.self-heal-window-size: 1
performance.cache-size: 67108864
cluster.data-self-heal-algorithm: diff
performance.io-thread-count: 32
cluster.min-free-disk: 250GB

thanks,
best regards,
joao

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2014 - self-heal stops some vms (virtual machines)

[Gluster-users] self-heal stops some vms (virtual machines)

[Gluster-users] self-heal stops some vms (virtual machines)