Sjors Gielen
2015-Jun-07 19:21 UTC
[Gluster-users] Gluster does not seem to detect a split-brain situation
Oops! Accidentally ran the command as non-root on Curacao, that's why there was no output. The actual output is: curacao# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat getfattr: Removing leading '/' from absolute path names # file: export/sdb1/data/Case/21000355/studies.dat trusted.afr.data-client-0=0x000000000000000000000000 trusted.afr.data-client-1=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 For reference, the output on bonaire: bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat getfattr: Removing leading '/' from absolute path names # file: export/sdb1/data/Case/21000355/studies.dat trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors at sjorsgielen.nl>:> I'm reading about quorums, I haven't set up anything like that yet. > > (In reply to Joe Julian, who responded off-list) > > The output of getfattr on bonaire: > > bonaire# getfattr -m . -d -e hex > /export/sdb1/data/Case/21000355/studies.dat > getfattr: Removing leading '/' from absolute path names > # file: export/sdb1/data/Case/21000355/studies.dat > trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 > > On curacao, the command gives no output. > > From `gluster volume status`, it seems that while the "brick > curacao:/export/sdb1/data" is online, it has no associated port number. > Curacao can connect to the port number provided by Bonaire just fine. There > are no firewalls on/between the two machines, they are on the same subnet > connected by Ethernet cables and two switches. > > By the way, warning messages just started appearing to > /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying > "mismatching ino/dev between file X and handle Y", though, maybe only just > now even though I started the full self-heal hours ago. > > [2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard] > 0-data-posix: mismatching ino/dev between file > /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and > handle > /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd > (9190215976/2065) > > Thanks again! > Sjors > > Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors at sjorsgielen.nl>: > >> Hi all, >> >> I work at a small, 8-person company that uses Gluster for its primary >> data storage. We have a volume called "data" that is replicated over two >> servers (details below). This worked perfectly for over a year, but lately >> we've been noticing some mismatches between the two bricks, so it seems >> there has been some split-brain situation that is not being detected or >> resolved. I have two questions about this: >> >> 1) I expected Gluster to (eventually) detect a situation like this; why >> doesn't it? >> 2) How do I fix this situation? I've tried an explicit 'heal', but that >> didn't seem to change anything. >> >> Thanks a lot for your help! >> Sjors >> >> ------8<------ >> >> Volume & peer info: http://pastebin.com/PN7tRXdU >> curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat >> 7bc2daec6be953ffae920d81fe6fa25c >> /export/sdb1/data/Case/21000355/studies.dat >> bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat >> 28c950a1e2a5f33c53a725bf8cd72681 >> /export/sdb1/data/Case/21000355/studies.dat >> >> # mallorca is one of the clients >> mallorca# md5sum /data/Case/21000355/studies.dat >> 7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat >> >> I expected an input/output error after reading this file, because of the >> split-brain situation, but got none. There are no entries in the GlusterFS >> logs of either bonaire or curacao. >> >> bonaire# gluster volume heal data full >> Launching heal operation to perform full self heal on volume data has >> been successful >> Use heal info commands to check status >> bonaire# gluster volume heal data info >> Brick bonaire:/export/sdb1/data/ >> Number of entries: 0 >> >> Brick curacao:/export/sdb1/data/ >> Number of entries: 0 >> >> (Same output on curacao, and hours after this, the md5sums on both bricks >> still differ.) >> >> curacao# gluster --version >> glusterfs 3.6.2 built on Mar 2 2015 14:05:34 >> Repository revision: git://git.gluster.com/glusterfs.git >> (Same version on Bonaire) >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/cd1c78bc/attachment.html>
Joe Julian
2015-Jun-07 20:09 UTC
[Gluster-users] Gluster does not seem to detect a split-brain situation
(oops... I hate when I reply off-list) That warning should, imho, be an error. That's saying that the handle, which should be a hardlink to the file, doesn't have a matching inode. It should if it's a hardlink. If it were me, I would: find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs /bin/rm This would clean up any handles that are not hardlinked where they should be and will allow gluster to repair them. Btw, the self-heal errors would be in glustershd.log and/or the client mount log(s), not (usually) the brick logs. On 06/07/2015 12:21 PM, Sjors Gielen wrote:> Oops! Accidentally ran the command as non-root on Curacao, that's why > there was no output. The actual output is: > > curacao# getfattr -m . -d -e hex > /export/sdb1/data/Case/21000355/studies.dat > getfattr: Removing leading '/' from absolute path names > # file: export/sdb1/data/Case/21000355/studies.dat > trusted.afr.data-client-0=0x000000000000000000000000 > trusted.afr.data-client-1=0x000000000000000000000000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 > > For reference, the output on bonaire: > > bonaire# getfattr -m . -d -e hex > /export/sdb1/data/Case/21000355/studies.dat > getfattr: Removing leading '/' from absolute path names > # file: export/sdb1/data/Case/21000355/studies.dat > trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 > > Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors at sjorsgielen.nl > <mailto:sjors at sjorsgielen.nl>>: > > I'm reading about quorums, I haven't set up anything like that yet. > > (In reply to Joe Julian, who responded off-list) > > The output of getfattr on bonaire: > > bonaire# getfattr -m . -d -e hex > /export/sdb1/data/Case/21000355/studies.dat > getfattr: Removing leading '/' from absolute path names > # file: export/sdb1/data/Case/21000355/studies.dat > trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 > > On curacao, the command gives no output. > > From `gluster volume status`, it seems that while the "brick > curacao:/export/sdb1/data" is online, it has no associated port > number. Curacao can connect to the port number provided by Bonaire > just fine. There are no firewalls on/between the two machines, > they are on the same subnet connected by Ethernet cables and two > switches. > > By the way, warning messages just started appearing to > /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying > "mismatching ino/dev between file X and handle Y", though, maybe > only just now even though I started the full self-heal hours ago. > > [2015-06-07 19:10:39.624393] W > [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching > ino/dev between file > /export/sdb1/data/Archive/S21/21008971/studies.dat > (9127104621/2065) and handle > /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd > (9190215976/2065) > > Thanks again! > Sjors > > Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen > <sjors at sjorsgielen.nl <mailto:sjors at sjorsgielen.nl>>: > > Hi all, > > I work at a small, 8-person company that uses Gluster for its > primary data storage. We have a volume called "data" that is > replicated over two servers (details below). This worked > perfectly for over a year, but lately we've been noticing some > mismatches between the two bricks, so it seems there has been > some split-brain situation that is not being detected or > resolved. I have two questions about this: > > 1) I expected Gluster to (eventually) detect a situation like > this; why doesn't it? > 2) How do I fix this situation? I've tried an explicit 'heal', > but that didn't seem to change anything. > > Thanks a lot for your help! > Sjors > > ------8<------ > > Volume & peer info: http://pastebin.com/PN7tRXdU > curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat > 7bc2daec6be953ffae920d81fe6fa25c > /export/sdb1/data/Case/21000355/studies.dat > bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat > 28c950a1e2a5f33c53a725bf8cd72681 > /export/sdb1/data/Case/21000355/studies.dat > > # mallorca is one of the clients > mallorca# md5sum /data/Case/21000355/studies.dat > 7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat > > I expected an input/output error after reading this file, > because of the split-brain situation, but got none. There are > no entries in the GlusterFS logs of either bonaire or curacao. > > bonaire# gluster volume heal data full > Launching heal operation to perform full self heal on volume > data has been successful > Use heal info commands to check status > bonaire# gluster volume heal data info > Brick bonaire:/export/sdb1/data/ > Number of entries: 0 > > Brick curacao:/export/sdb1/data/ > Number of entries: 0 > > (Same output on curacao, and hours after this, the md5sums on > both bricks still differ.) > > curacao# gluster --version > glusterfs 3.6.2 built on Mar 2 2015 14:05:34 > Repository revision: git://git.gluster.com/glusterfs.git > <http://git.gluster.com/glusterfs.git> > (Same version on Bonaire) > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/9aa0a4e1/attachment.html>