thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] Query on healing process [Mar 2016]

If this information is useful, please help other people find it:
Share via:

ABHISHEK PALIWAL

2016-Mar-14 05:06 UTC

[Gluster-users] [Gluster-devel] Query on healing process

Hi Ravishankar,

I just want to inform that this file have some different properties from
other files like this is the file which having the fixed size and when
there is no space in file the next data will start wrapping from the top of
the file.

Means in this file we are doing the wrapping of the data as well.

So, I just want to know is this feature of file will effect gluster to
identify the split-brain or xattr attributes?

Regards,
Abhishek

On Fri, Mar 4, 2016 at 7:00 PM, ABHISHEK PALIWAL <abhishpaliwal at
gmail.com>
wrote:
>
>
> On Fri, Mar 4, 2016 at 6:36 PM, Ravishankar N <ravishankar at
redhat.com>
> wrote:
>
>> On 03/04/2016 06:23 PM, ABHISHEK PALIWAL wrote:
>>
>>
>>> Ok, just to confirm, glusterd  and other brick processes are
running
>>> after this node rebooted?
>>> When you run the above command, you need to check
>>> /var/log/glusterfs/glfsheal-volname.log logs errros. Setting
>>> client-log-level to DEBUG would give you a more verbose message
>>>
>>> Yes, glusterd and other brick processes running fine. I have check
the
>> /var/log/glusterfs/glfsheal-volname.log file without the log-level=
DEBUG.
>> Here is the logs from that file
>>
>> [2016-03-02 13:51:39.059440] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2016-03-02 13:51:39.072172] W [MSGID: 101012]
>> [common-utils.c:2776:gf_get_reserved_ports] 0-glusterfs: could not open
the
>> file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved
ports
>> info [No such file or directory]
>> [2016-03-02 13:51:39.072228] W [MSGID: 101081]
>> [common-utils.c:2810:gf_process_reserved_ports] 0-glusterfs: Not able
to
>> get reserved ports, hence there is a possibility that glusterfs may
consume
>> reserved port
>> [2016-03-02 13:51:39.072583] E [socket.c:2278:socket_connect_finish]
>> 0-gfapi: connection to 127.0.0.1:24007 failed (Connection refused)
>>
>>
>> Not sure why ^^ occurs. You could try flushing iptables (iptables -F),
>> restart glusterd and run the heal info command again .
>>
>
> No hint from the logs? I'll try your suggestion.
>
>>
>> [2016-03-02 13:51:39.072663] E [MSGID: 104024]
>> [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with
>> remote-host: localhost (Transport endpoint is not connected) [Transport
>> endpoint is not connected]
>> [2016-03-02 13:51:39.072700] I [MSGID: 104025]
>> [glfs-mgmt.c:744:mgmt_rpc_notify] 0-glfs-mgmt: Exhausted all volfile
>> servers [Transport endpoint is not connected]
>>
>>> # gluster volume heal c_glusterfs info split-brain
>>> c_glusterfs: Not able to fetch volfile from glusterd
>>> Volume heal failed.
>>>
>>>
>>>
>>>
>>> And based on the your observation I understood that this is not the
>>> problem of split-brain but *is there any way through which can find
out
>>> the file which is not in split-brain as well as not in sync?*
>>>
>>>
>>> `gluster volume heal c_glusterfs info split-brain`  should give you
>>> files that need heal.
>>>
>>
>> Sorry  I meant 'gluster volume heal c_glusterfs info' should
give you
>> the files that need heal and 'gluster volume heal c_glusterfs info
>> split-brain' the list of files in split-brain.
>> The commands are detailed in
>>
https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md
>>
>
> Yes, I have tried this as well It is also giving Number of entries : 0
> means no healing is required but the file
> /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml is
> not in sync both of brick showing the different version of this file.
>
> You can see it in the getfattr command outcome as well.
>
>
> # getfattr -m . -d -e hex
> /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> getfattr: Removing leading '/' from absolute path names
> # file:
> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> trusted.afr.c_glusterfs-client-0=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-2=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-4=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-6=0x000000000000000000000000
> trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000** //because
> client8 is the latest client in our case and starting 8 digits *
>
> *00000006....are saying like there is something in changelog data.*
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x000000000000001356d86c0c000217fd
> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
> # lhsh 002500 getfattr -m . -d -e hex
> /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> getfattr: Removing leading '/' from absolute path names
> # file:
> opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000** // and
> here we can say that there is no split brain but the file is out of sync*
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x000000000000001156d86c290005735c
> trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
>
>> Regards,
>>
>    Abhishek
>
>>
>

-- 




Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160314/1986932f/attachment.html>

Ravishankar N

2016-Mar-14 08:07 UTC

head link

[Gluster-users] [Gluster-devel] Query on healing process

On 03/14/2016 10:36 AM, ABHISHEK PALIWAL wrote:> Hi Ravishankar,
>
> I just want to inform that this file have some different properties 
> from other files like this is the file which having the fixed size and 
> when there is no space in file the next data will start wrapping from 
> the top of the file.
>
> Means in this file we are doing the wrapping of the data as well.
>
> So, I just want to know is this feature of file will effect gluster to 
> identify the split-brain or xattr attributes?Hi,
No it shouldn't matter at what offset the writes happen. The xattrs only 
track that the write was  missed  (and therefore a pending heal), 
irrespective of (offset, length).
Ravi
>
> Regards,
> Abhishek
>
> On Fri, Mar 4, 2016 at 7:00 PM, ABHISHEK PALIWAL 
> <abhishpaliwal at gmail.com <mailto:abhishpaliwal at
gmail.com>> wrote:
>
>
>
>     On Fri, Mar 4, 2016 at 6:36 PM, Ravishankar N
>     <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>> wrote:
>
>         On 03/04/2016 06:23 PM, ABHISHEK PALIWAL wrote:
>>
>>
>>             Ok, just to confirm, glusterd  and other brick processes
>>             are running after this node rebooted?
>>             When you run the above command, you need to check
>>             /var/log/glusterfs/glfsheal-volname.log logs errros.
>>             Setting client-log-level to DEBUG would give you a more
>>             verbose message
>>
>>         Yes, glusterd and other brick processes running fine. I have
>>         check the /var/log/glusterfs/glfsheal-volname.log file
>>         without the log-level= DEBUG. Here is the logs from that file
>>
>>         [2016-03-02 13:51:39.059440] I [MSGID: 101190]
>>         [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
>>         Started thread with index 1
>>         [2016-03-02 13:51:39.072172] W [MSGID: 101012]
>>         [common-utils.c:2776:gf_get_reserved_ports] 0-glusterfs:
>>         could not open the file
>>         /proc/sys/net/ipv4/ip_local_reserved_ports for getting
>>         reserved ports info [No such file or directory]
>>         [2016-03-02 13:51:39.072228] W [MSGID: 101081]
>>         [common-utils.c:2810:gf_process_reserved_ports] 0-glusterfs:
>>         Not able to get reserved ports, hence there is a possibility
>>         that glusterfs may consume reserved port
>>         [2016-03-02 13:51:39.072583] E
>>         [socket.c:2278:socket_connect_finish] 0-gfapi: connection to
>>         127.0.0.1:24007 <http://127.0.0.1:24007> failed
(Connection
>>         refused)
>
>         Not sure why ^^ occurs. You could try flushing iptables
>         (iptables -F), restart glusterd and run the heal info command
>         again .
>
>
>     No hint from the logs? I'll try your suggestion.
>
>
>>         [2016-03-02 13:51:39.072663] E [MSGID: 104024]
>>         [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to
>>         connect with remote-host: localhost (Transport endpoint is
>>         not connected) [Transport endpoint is not connected]
>>         [2016-03-02 13:51:39.072700] I [MSGID: 104025]
>>         [glfs-mgmt.c:744:mgmt_rpc_notify] 0-glfs-mgmt: Exhausted all
>>         volfile servers [Transport endpoint is not connected]
>>
>>>             # gluster volume heal c_glusterfs info split-brain
>>>             c_glusterfs: Not able to fetch volfile from glusterd
>>>             Volume heal failed.
>>
>>>
>>>
>>>             And based on the your observation I understood that
this
>>>             is not the problem of split-brain but *is there any way
>>>             through which can find out the file which is not in
>>>             split-brain as well as not in sync?*
>>
>>             `gluster volume heal c_glusterfs info split-brain` 
>>             should give you files that need heal.
>>
>
>         Sorry  I meant 'gluster volume heal c_glusterfs info'
should
>         give you the files that need heal and 'gluster volume heal
>         c_glusterfs info split-brain' the list of files in split-brain.
>         The commands are detailed in
>        
https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md
>
>
>     Yes, I have tried this as well It is also giving Number of entries
>     : 0 means no healing is required but the file
>     /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>     is not in sync both of brick showing the different version of this
>     file.
>
>     You can see it in the getfattr command outcome as well.
>
>
>     # getfattr -m . -d -e hex
>     /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>
>     getfattr: Removing leading '/' from absolute path names
>     # file:
>     opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>     trusted.afr.c_glusterfs-client-0=0x000000000000000000000000
>     trusted.afr.c_glusterfs-client-2=0x000000000000000000000000
>     trusted.afr.c_glusterfs-client-4=0x000000000000000000000000
>     trusted.afr.c_glusterfs-client-6=0x000000000000000000000000
>     trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000**//because
>     client8 is the latest client in our case and starting 8 digits **
>     *
>     *00000006....are saying like there is something in changelog data.
>     *
>     trusted.afr.dirty=0x000000000000000000000000
>     trusted.bit-rot.version=0x000000000000001356d86c0c000217fd
>     trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
>     # lhsh 002500 getfattr -m . -d -e hex
>     /opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>
>     getfattr: Removing leading '/' from absolute path names
>     # file:
>     opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
>     trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000**//
>     and here we can say that there is no split brain but the file is
>     out of sync*
>     trusted.afr.dirty=0x000000000000000000000000
>     trusted.bit-rot.version=0x000000000000001156d86c290005735c
>     trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
>
>         Regards,
>
>        Abhishek
>
>
>
>
>
> -- 
>
>
>
>
> Regards
> Abhishek Paliwal

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160314/2fac7f2c/attachment.html>

Gluster users - Mar 2016 - [Gluster-devel] Query on healing process

[Gluster-users] [Gluster-devel] Query on healing process

[Gluster-users] [Gluster-devel] Query on healing process