thr3ads.net - Gluster users - [Gluster-users] Question about "Possibly undergoing heal" on a file being reported. [May 2016]

If this information is useful, please help other people find it:
Share via:

Joe Julian

2016-May-05 18:44 UTC

[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.

FYI, that's not "no activity". The file is clearly changing. The
dirty
state flipping back and forth between 1 and 0 is a byproduct of writes 
occurring. The clients set the flag, do the write, then clear the flag. 
My guess is that's why it's only "possibly" undergoing
self-heal. The
write may have still been pending at the moment of the check.

On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:> There are 2 hosts involved and we have a replica value of 2.  The hosts are
called n1c1cl1 and n1c2cl1.  Below is the info you requested. The file name in
gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687".
>
> -- From the n1c1cl1 brick --
>
> [root at n1c1cl1 ~]# ll -h
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> -rwxr--r--. 2 root root 3.7G May  5 12:10
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>
> [root at n1c1cl1 ~]# getfattr -d -m . -e hex
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> getfattr: Removing leading '/' from absolute path names
> # file: data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
> trusted.afr.dirty=0xe68000000000000000000000
> trusted.bit-rot.version=0x020000000000000057196a8d000e1606
> trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
>
> -- From the n1c2cl1 brick --
>
> [root at n1c2cl1 ~]# ll -h
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> -rwxr--r--. 2 root root 3.7G May  5 12:16
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>
> [root at n1c2cl1 ~]# getfattr -d -m . -e hex
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> getfattr: Removing leading '/' from absolute path names
> # file: data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
> trusted.afr.dirty=0xd38000000000000000000000
> trusted.bit-rot.version=0x020000000000000057196a8d000e20ae
> trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
>
> --
>
> The "trusted.afr.dirty" is changing about 2 or 3 times a minute
on both files.  Let me know if you need further info and thanks.
>
> Richard Klein
> RSI
>
>
>
> From: Ravishankar N [mailto:ravishankar at redhat.com]
> Sent: Wednesday, May 04, 2016 8:52 PM
> To: Richard Klein (RSI); gluster-users at gluster.org
> Subject: Re: [Gluster-users] Question about "Possibly undergoing
heal" on a file being reported.
>
>
>> On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote:
>> First time e-mailer to the group, greetings all.  We are using Gluster
3.7.6 in Cloudstack on CentOS7 with KVM.  Gluster is our primary storage.  All
is going well >but we have a test VM QCOW2 volume that gets stuck in the
"Possibly undergoing healing".  By stuck I mean it stays in that state
for over 24 hrs.  This is a test VM >with no activity on it and we have
removed the swap file on the guest as well thinking that may be causing high
I/O.  All the tools show that the VM is basically idle >with low I/O.  The
only way I can clear it up is to power the VM off, move the QCOW2 volume from
the Gluster mount then back (basically remove and recreate it) >then power
the VM back on.  Once I do this process all is well again but then it happened
again on the same volume/file.
>>   
>> One additional note, I have even powered off the VM completely and the
QCOW2 file still stays in this state.
>>   
>> When this happens, can you share the output of the extended attributes
of the file in question from all the bricks of the replica in which the file
resides?
> `getfattr -d -m . -e hex /path/to/bricks/file-name`
>
> Also what is the size of this VM image file?
>
> Thanks,
> Ravi
>
>
>
>> Is there a way to stop/abort or force the heal to finish?  Any help
with a direction would be appreciated.
>>   
>> Thanks,
>>   
>> Richard Klein
>> RSI
>   
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Richard Klein (RSI)

2016-May-05 19:52 UTC

head link

[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.

I agree there is activity but it's very low I/O based, like updating log
files.  It shouldn't be high enough IO to keep it permanently in the
"Possibly undergoing healing" state for days.  But just to make sure,
I powered off the VM and there is no activity now at all and the
"trusted.afr.dirty" is still changing.  I will leave the VM in a
powered off state until tomorrow.  I agree with you that is shouldn't but
that is my dilemma.

Thanks for the insight,

Richard Klein
RSI
> -----Original Message-----
> From: gluster-users-bounces at gluster.org [mailto:gluster-users-
> bounces at gluster.org] On Behalf Of Joe Julian
> Sent: Thursday, May 05, 2016 1:44 PM
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Question about "Possibly undergoing
heal" on a file
> being reported.
> 
> FYI, that's not "no activity". The file is clearly changing.
The dirty state flipping
> back and forth between 1 and 0 is a byproduct of writes occurring. The
clients
> set the flag, do the write, then clear the flag.
> My guess is that's why it's only "possibly" undergoing
self-heal. The write may
> have still been pending at the moment of the check.
> 
> On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:
> > There are 2 hosts involved and we have a replica value of 2.  The
hosts are
> called n1c1cl1 and n1c2cl1.  Below is the info you requested. The file name
in
> gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687".
> >
> > -- From the n1c1cl1 brick --
> >
> > [root at n1c1cl1 ~]# ll -h
> > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > -rwxr--r--. 2 root root 3.7G May  5 12:10
> > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> >
> > [root at n1c1cl1 ~]# getfattr -d -m . -e hex
> > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > getfattr: Removing leading '/' from absolute path names #
file:
> > data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> >
> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
> 5
> > f743a733000
> > trusted.afr.dirty=0xe68000000000000000000000
> > trusted.bit-rot.version=0x020000000000000057196a8d000e1606
> > trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
> >
> > -- From the n1c2cl1 brick --
> >
> > [root at n1c2cl1 ~]# ll -h
> > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > -rwxr--r--. 2 root root 3.7G May  5 12:16
> > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> >
> > [root at n1c2cl1 ~]# getfattr -d -m . -e hex
> > /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > getfattr: Removing leading '/' from absolute path names #
file:
> > data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> >
> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
> 5
> > f743a733000
> > trusted.afr.dirty=0xd38000000000000000000000
> > trusted.bit-rot.version=0x020000000000000057196a8d000e20ae
> > trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
> >
> > --
> >
> > The "trusted.afr.dirty" is changing about 2 or 3 times a
minute on both files.
> Let me know if you need further info and thanks.
> >
> > Richard Klein
> > RSI
> >
> >
> >
> > From: Ravishankar N [mailto:ravishankar at redhat.com]
> > Sent: Wednesday, May 04, 2016 8:52 PM
> > To: Richard Klein (RSI); gluster-users at gluster.org
> > Subject: Re: [Gluster-users] Question about "Possibly undergoing
heal" on a
> file being reported.
> >
> >
> >> On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote:
> >> First time e-mailer to the group, greetings all.  We are using
Gluster 3.7.6 in
> Cloudstack on CentOS7 with KVM.  Gluster is our primary storage.  All is
going
> well >but we have a test VM QCOW2 volume that gets stuck in the
"Possibly
> undergoing healing".  By stuck I mean it stays in that state for over
24 hrs.  This
> is a test VM >with no activity on it and we have removed the swap file
on the
> guest as well thinking that may be causing high I/O.  All the tools show
that the
> VM is basically idle >with low I/O.  The only way I can clear it up is
to power
> the VM off, move the QCOW2 volume from the Gluster mount then back
> (basically remove and recreate it) >then power the VM back on.  Once I
do this
> process all is well again but then it happened again on the same
volume/file.
> >>
> >> One additional note, I have even powered off the VM completely and
the
> QCOW2 file still stays in this state.
> >>
> >> When this happens, can you share the output of the extended
attributes of
> the file in question from all the bricks of the replica in which the file
resides?
> > `getfattr -d -m . -e hex /path/to/bricks/file-name`
> >
> > Also what is the size of this VM image file?
> >
> > Thanks,
> > Ravi
> >
> >
> >
> >> Is there a way to stop/abort or force the heal to finish?  Any
help with a
> direction would be appreciated.
> >>
> >> Thanks,
> >>
> >> Richard Klein
> >> RSI
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - May 2016 - Question about "Possibly undergoing heal" on a file being reported.

[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.

[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.