Daniel Manser
2011-Nov-30 10:11 UTC
[Gluster-users] Split-brains when shutting down Xen domUs
We are running a couple Xen domUs on a two-node Gluster setup (2 Gluster nodes, 2 Xen dedicated hosts, all machines run CentOS). Each domU image is located in its own volume. At the precise moment when I shut down the domU (from inside the domU), I got the following log entry on the Xen host (Gluster client): [2011-11-29 15:28:24.579646] I [afr-self-heal-common.c:537:afr_sh_mark_sources] 0-vol0_atmail1_example_org-replicate-0: split-brain possible, no source detected [2011-11-29 15:28:24.579707] I [afr-common.c:801:afr_lookup_done] 0-vol0_atmail1_example_org-replicate-0: background data self-heal triggered. path: /atmail1.example.org.img [2011-11-29 15:28:24.581251] I [afr-self-heal-common.c:537:afr_sh_mark_sources] 0-vol0_atmail1_example_org-replicate-0: split-brain possible, no source detected [2011-11-29 15:28:24.581282] E [afr-self-heal-data.c:637:afr_sh_data_fix] 0-vol0_atmail1_example_org-replicate-0: Unable to self-heal contents of '/atmail1.example.org.img' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-11-29 15:28:24.582075] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-vol0_atmail1_example_org-replicate-0: background data data self-heal completed on /atmail1.example.org.img [2011-11-29 15:28:24.778445] W [afr-open.c:168:afr_open] 0-vol0_atmail1_example_org-replicate-0: failed to open as split brain seen, returning EIO [2011-11-29 15:28:24.778503] W [fuse-bridge.c:582:fuse_fd_cbk] 0-glusterfs-fuse: 18943778: OPEN() /atmail1.example.org.img => -1 (Input/output error) [2011-11-29 15:28:24.778585] W [afr-open.c:168:afr_open] 0-vol0_atmail1_example_org-replicate-0: failed to open as split brain seen, returning EIO [2011-11-29 15:28:24.778610] W [fuse-bridge.c:582:fuse_fd_cbk] 0-glusterfs-fuse: 18943779: OPEN() /atmail1.example.org.img => -1 (Input/output error) [2011-11-29 15:28:25.93271] W [afr-open.c:168:afr_open] 0-vol0_atmail1_example_org-replicate-0: failed to open as split brain seen, returning EIO [2011-11-29 15:28:25.93327] W [fuse-bridge.c:582:fuse_fd_cbk] 0-glusterfs-fuse: 18943780: OPEN() /atmail1.example.org.img => -1 (Input/output error) I've had to delete one image on a Gluster node, trigger self-heal/replication, and then start the domU again. The split-brain situations do not seem to happen every time, though. On the second Xen host, the Gluster volume is mounted but nothing writes/reads from that vol. I don't think this should be a problem since Gluster can handle multiple clients. My volume setup is pretty straightforward: [root at glu1 ~]# gluster volume info vol0_atmail1_example_org Volume Name: vol0_atmail1_example_org Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: glu1.example.org:/mnt/vol0/atmail1_example_org Brick2: glu2.example.org:/mnt/vol0/atmail1_example_org Options Reconfigured: network.ping-timeout: 10 I wonder if someone ran into similar problems with Xen, and what solution they might came up with. Daniel