Joe Julian
2016-May-05 18:44 UTC
[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.
FYI, that's not "no activity". The file is clearly changing. The dirty state flipping back and forth between 1 and 0 is a byproduct of writes occurring. The clients set the flag, do the write, then clear the flag. My guess is that's why it's only "possibly" undergoing self-heal. The write may have still been pending at the moment of the check. On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:> There are 2 hosts involved and we have a replica value of 2. The hosts are called n1c1cl1 and n1c2cl1. Below is the info you requested. The file name in gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687". > > -- From the n1c1cl1 brick -- > > [root at n1c1cl1 ~]# ll -h /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > -rwxr--r--. 2 root root 3.7G May 5 12:10 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > [root at n1c1cl1 ~]# getfattr -d -m . -e hex /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > getfattr: Removing leading '/' from absolute path names > # file: data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 > trusted.afr.dirty=0xe68000000000000000000000 > trusted.bit-rot.version=0x020000000000000057196a8d000e1606 > trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f > > -- From the n1c2cl1 brick -- > > [root at n1c2cl1 ~]# ll -h /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > -rwxr--r--. 2 root root 3.7G May 5 12:16 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > [root at n1c2cl1 ~]# getfattr -d -m . -e hex /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > getfattr: Removing leading '/' from absolute path names > # file: data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 > trusted.afr.dirty=0xd38000000000000000000000 > trusted.bit-rot.version=0x020000000000000057196a8d000e20ae > trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f > > -- > > The "trusted.afr.dirty" is changing about 2 or 3 times a minute on both files. Let me know if you need further info and thanks. > > Richard Klein > RSI > > > > From: Ravishankar N [mailto:ravishankar at redhat.com] > Sent: Wednesday, May 04, 2016 8:52 PM > To: Richard Klein (RSI); gluster-users at gluster.org > Subject: Re: [Gluster-users] Question about "Possibly undergoing heal" on a file being reported. > > >> On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote: >> First time e-mailer to the group, greetings all. We are using Gluster 3.7.6 in Cloudstack on CentOS7 with KVM. Gluster is our primary storage. All is going well >but we have a test VM QCOW2 volume that gets stuck in the "Possibly undergoing healing". By stuck I mean it stays in that state for over 24 hrs. This is a test VM >with no activity on it and we have removed the swap file on the guest as well thinking that may be causing high I/O. All the tools show that the VM is basically idle >with low I/O. The only way I can clear it up is to power the VM off, move the QCOW2 volume from the Gluster mount then back (basically remove and recreate it) >then power the VM back on. Once I do this process all is well again but then it happened again on the same volume/file. >> >> One additional note, I have even powered off the VM completely and the QCOW2 file still stays in this state. >> >> When this happens, can you share the output of the extended attributes of the file in question from all the bricks of the replica in which the file resides? > `getfattr -d -m . -e hex /path/to/bricks/file-name` > > Also what is the size of this VM image file? > > Thanks, > Ravi > > > >> Is there a way to stop/abort or force the heal to finish? Any help with a direction would be appreciated. >> >> Thanks, >> >> Richard Klein >> RSI > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Richard Klein (RSI)
2016-May-05 19:52 UTC
[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.
I agree there is activity but it's very low I/O based, like updating log files. It shouldn't be high enough IO to keep it permanently in the "Possibly undergoing healing" state for days. But just to make sure, I powered off the VM and there is no activity now at all and the "trusted.afr.dirty" is still changing. I will leave the VM in a powered off state until tomorrow. I agree with you that is shouldn't but that is my dilemma. Thanks for the insight, Richard Klein RSI> -----Original Message----- > From: gluster-users-bounces at gluster.org [mailto:gluster-users- > bounces at gluster.org] On Behalf Of Joe Julian > Sent: Thursday, May 05, 2016 1:44 PM > To: gluster-users at gluster.org > Subject: Re: [Gluster-users] Question about "Possibly undergoing heal" on a file > being reported. > > FYI, that's not "no activity". The file is clearly changing. The dirty state flipping > back and forth between 1 and 0 is a byproduct of writes occurring. The clients > set the flag, do the write, then clear the flag. > My guess is that's why it's only "possibly" undergoing self-heal. The write may > have still been pending at the moment of the check. > > On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote: > > There are 2 hosts involved and we have a replica value of 2. The hosts are > called n1c1cl1 and n1c2cl1. Below is the info you requested. The file name in > gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687". > > > > -- From the n1c1cl1 brick -- > > > > [root at n1c1cl1 ~]# ll -h > > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > -rwxr--r--. 2 root root 3.7G May 5 12:10 > > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > > > [root at n1c1cl1 ~]# getfattr -d -m . -e hex > > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > getfattr: Removing leading '/' from absolute path names # file: > > data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > > security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74 > 5 > > f743a733000 > > trusted.afr.dirty=0xe68000000000000000000000 > > trusted.bit-rot.version=0x020000000000000057196a8d000e1606 > > trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f > > > > -- From the n1c2cl1 brick -- > > > > [root at n1c2cl1 ~]# ll -h > > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > -rwxr--r--. 2 root root 3.7G May 5 12:16 > > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > > > [root at n1c2cl1 ~]# getfattr -d -m . -e hex > > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > getfattr: Removing leading '/' from absolute path names # file: > > data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687 > > > security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74 > 5 > > f743a733000 > > trusted.afr.dirty=0xd38000000000000000000000 > > trusted.bit-rot.version=0x020000000000000057196a8d000e20ae > > trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f > > > > -- > > > > The "trusted.afr.dirty" is changing about 2 or 3 times a minute on both files. > Let me know if you need further info and thanks. > > > > Richard Klein > > RSI > > > > > > > > From: Ravishankar N [mailto:ravishankar at redhat.com] > > Sent: Wednesday, May 04, 2016 8:52 PM > > To: Richard Klein (RSI); gluster-users at gluster.org > > Subject: Re: [Gluster-users] Question about "Possibly undergoing heal" on a > file being reported. > > > > > >> On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote: > >> First time e-mailer to the group, greetings all. We are using Gluster 3.7.6 in > Cloudstack on CentOS7 with KVM. Gluster is our primary storage. All is going > well >but we have a test VM QCOW2 volume that gets stuck in the "Possibly > undergoing healing". By stuck I mean it stays in that state for over 24 hrs. This > is a test VM >with no activity on it and we have removed the swap file on the > guest as well thinking that may be causing high I/O. All the tools show that the > VM is basically idle >with low I/O. The only way I can clear it up is to power > the VM off, move the QCOW2 volume from the Gluster mount then back > (basically remove and recreate it) >then power the VM back on. Once I do this > process all is well again but then it happened again on the same volume/file. > >> > >> One additional note, I have even powered off the VM completely and the > QCOW2 file still stays in this state. > >> > >> When this happens, can you share the output of the extended attributes of > the file in question from all the bricks of the replica in which the file resides? > > `getfattr -d -m . -e hex /path/to/bricks/file-name` > > > > Also what is the size of this VM image file? > > > > Thanks, > > Ravi > > > > > > > >> Is there a way to stop/abort or force the heal to finish? Any help with a > direction would be appreciated. > >> > >> Thanks, > >> > >> Richard Klein > >> RSI > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users