Sjors Gielen
2015-Jun-07 17:13 UTC
[Gluster-users] Gluster does not seem to detect a split-brain situation
Hi all, I work at a small, 8-person company that uses Gluster for its primary data storage. We have a volume called "data" that is replicated over two servers (details below). This worked perfectly for over a year, but lately we've been noticing some mismatches between the two bricks, so it seems there has been some split-brain situation that is not being detected or resolved. I have two questions about this: 1) I expected Gluster to (eventually) detect a situation like this; why doesn't it? 2) How do I fix this situation? I've tried an explicit 'heal', but that didn't seem to change anything. Thanks a lot for your help! Sjors ------8<------ Volume & peer info: http://pastebin.com/PN7tRXdU curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat 7bc2daec6be953ffae920d81fe6fa25c /export/sdb1/data/Case/21000355/studies.dat bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat 28c950a1e2a5f33c53a725bf8cd72681 /export/sdb1/data/Case/21000355/studies.dat # mallorca is one of the clients mallorca# md5sum /data/Case/21000355/studies.dat 7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat I expected an input/output error after reading this file, because of the split-brain situation, but got none. There are no entries in the GlusterFS logs of either bonaire or curacao. bonaire# gluster volume heal data full Launching heal operation to perform full self heal on volume data has been successful Use heal info commands to check status bonaire# gluster volume heal data info Brick bonaire:/export/sdb1/data/ Number of entries: 0 Brick curacao:/export/sdb1/data/ Number of entries: 0 (Same output on curacao, and hours after this, the md5sums on both bricks still differ.) curacao# gluster --version glusterfs 3.6.2 built on Mar 2 2015 14:05:34 Repository revision: git://git.gluster.com/glusterfs.git (Same version on Bonaire) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/08d8501a/attachment.html>
Юрий Полторацкий
2015-Jun-07 18:38 UTC
[Gluster-users] Gluster does not seem to detect a split-brain situation
Hi, First, I do not think that you have a split-brain. I have a split-brain and my `gluster volume heal info` is: gluster> volume heal vol3 info Brick node5.virt.local:/storage/brick12/ /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain Number of entries: 1 Brick node6.virt.local:/storage/brick13/ /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain Number of entries: 1 Second, you would setup a quorum. Some info you can sea here <https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html>. My config is (two server with replica 2): cluster.server-quorum-type: server cluster.quorum-type: fixed cluster.quorum-count: 1 cluster.server-quorum-ratio: 51% And the last, I am a new in Gluster, so can be wrong. 07.06.2015 20:13, Sjors Gielen ?????:> Hi all, > > I work at a small, 8-person company that uses Gluster for its primary > data storage. We have a volume called "data" that is replicated over > two servers (details below). This worked perfectly for over a year, > but lately we've been noticing some mismatches between the two bricks, > so it seems there has been some split-brain situation that is not > being detected or resolved. I have two questions about this: > > 1) I expected Gluster to (eventually) detect a situation like this; > why doesn't it? > 2) How do I fix this situation? I've tried an explicit 'heal', but > that didn't seem to change anything. > > Thanks a lot for your help! > Sjors > > ------8<------ > > Volume & peer info: http://pastebin.com/PN7tRXdU > curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat > 7bc2daec6be953ffae920d81fe6fa25c > /export/sdb1/data/Case/21000355/studies.dat > bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat > 28c950a1e2a5f33c53a725bf8cd72681 > /export/sdb1/data/Case/21000355/studies.dat > > # mallorca is one of the clients > mallorca# md5sum /data/Case/21000355/studies.dat > 7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat > > I expected an input/output error after reading this file, because of > the split-brain situation, but got none. There are no entries in the > GlusterFS logs of either bonaire or curacao. > > bonaire# gluster volume heal data full > Launching heal operation to perform full self heal on volume data has > been successful > Use heal info commands to check status > bonaire# gluster volume heal data info > Brick bonaire:/export/sdb1/data/ > Number of entries: 0 > > Brick curacao:/export/sdb1/data/ > Number of entries: 0 > > (Same output on curacao, and hours after this, the md5sums on both > bricks still differ.) > > curacao# gluster --version > glusterfs 3.6.2 built on Mar 2 2015 14:05:34 > Repository revision: git://git.gluster.com/glusterfs.git > <http://git.gluster.com/glusterfs.git> > (Same version on Bonaire) > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/145d635a/attachment.html>
Sjors Gielen
2015-Jun-07 19:13 UTC
[Gluster-users] Gluster does not seem to detect a split-brain situation
I'm reading about quorums, I haven't set up anything like that yet. (In reply to Joe Julian, who responded off-list) The output of getfattr on bonaire: bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat getfattr: Removing leading '/' from absolute path names # file: export/sdb1/data/Case/21000355/studies.dat trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 On curacao, the command gives no output.>From `gluster volume status`, it seems that while the "brickcuracao:/export/sdb1/data" is online, it has no associated port number. Curacao can connect to the port number provided by Bonaire just fine. There are no firewalls on/between the two machines, they are on the same subnet connected by Ethernet cables and two switches. By the way, warning messages just started appearing to /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying "mismatching ino/dev between file X and handle Y", though, maybe only just now even though I started the full self-heal hours ago. [2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching ino/dev between file /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and handle /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd (9190215976/2065) Thanks again! Sjors Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors at sjorsgielen.nl>:> Hi all, > > I work at a small, 8-person company that uses Gluster for its primary data > storage. We have a volume called "data" that is replicated over two servers > (details below). This worked perfectly for over a year, but lately we've > been noticing some mismatches between the two bricks, so it seems there has > been some split-brain situation that is not being detected or resolved. I > have two questions about this: > > 1) I expected Gluster to (eventually) detect a situation like this; why > doesn't it? > 2) How do I fix this situation? I've tried an explicit 'heal', but that > didn't seem to change anything. > > Thanks a lot for your help! > Sjors > > ------8<------ > > Volume & peer info: http://pastebin.com/PN7tRXdU > curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat > 7bc2daec6be953ffae920d81fe6fa25c > /export/sdb1/data/Case/21000355/studies.dat > bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat > 28c950a1e2a5f33c53a725bf8cd72681 > /export/sdb1/data/Case/21000355/studies.dat > > # mallorca is one of the clients > mallorca# md5sum /data/Case/21000355/studies.dat > 7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat > > I expected an input/output error after reading this file, because of the > split-brain situation, but got none. There are no entries in the GlusterFS > logs of either bonaire or curacao. > > bonaire# gluster volume heal data full > Launching heal operation to perform full self heal on volume data has been > successful > Use heal info commands to check status > bonaire# gluster volume heal data info > Brick bonaire:/export/sdb1/data/ > Number of entries: 0 > > Brick curacao:/export/sdb1/data/ > Number of entries: 0 > > (Same output on curacao, and hours after this, the md5sums on both bricks > still differ.) > > curacao# gluster --version > glusterfs 3.6.2 built on Mar 2 2015 14:05:34 > Repository revision: git://git.gluster.com/glusterfs.git > (Same version on Bonaire) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/d43178a0/attachment.html>