thr3ads.net - Gluster users - [Gluster-users] [Centos7x64] Geo-replication problem glusterfs 3.7.0-2 [May 2015]

If this information is useful, please help other people find it:
Share via:

Aravinda

2015-May-25 06:00 UTC

[Gluster-users] [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

Looks like this is GFID conflict issue not the tarssh issue.

_GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0,
'mode': 33152, 'entry':
'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb', 'op':
'CREATE'}, 2)

     Data: {'uid': 0,
            'gfid': 'e529a399-756d-4cb1-9779-0af2822a0d94',
            'gid': 0,
            'mode': 33152,
            'entry':
'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',
            'op': 'CREATE'}

     and Error: 2

During creation of "main.mdb" RPC failed with error number 2, ie,
ENOENT. This error comes when parent directory not exists or exists with
different GFID.
In this case Parent GFID "874799ef-df75-437b-bc8f-3fcd58b54789" does
not exists on slave.


To fix the issue,
-----------------
Find the parent directory of "main.mdb",
Get the GFID of that directory, using getfattr
Check the GFID of the same directory in Slave(To confirm GFIDs are different)
To fix the issue, Delete that directory in Slave.
Set virtual xattr for that directory and all the files inside that directory.
     setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <DIR>
     setfattr -n glusterfs.geo-rep.trigger-sync -v "1"
<file-path>


Geo-rep will recreate the directory with Proper GFID and starts sync.

Let us know if you need any help.

--
regards
Aravinda



On 05/25/2015 10:54 AM, Kotresh Hiremath Ravishankar
wrote:> Hi Wodel,
>
> Is the sync mode, tar over ssh (i.e., config use_tarssh is true) ?
> If yes, there is known issue with it and patch is already up in master.
>
> But it can be resolved in either of the two ways.
>
> 1. If sync mode required is tar over ssh, just disable sync_xattrs which is
true
>     by default.
>
>      gluster vol geo-rep <master-vol>
<slave-host>::<slave-vol> config sync_xattrs false
>
> 2. If sync mode is ok to be changed to rsync. Please do.
>      
>      gluster vol geo-rep <master-vol>
<slave-host>::<slave-vol> use_tarssh false
>
> NOTE: rsync supports syncing of acls and xattrs where as tar over ssh does
not.
>        In 3.7.0-2, tar over ssh should be used with sync_xattrs to false
>
> Hope this helps.
>
> Thanks and Regards,
> Kotresh H R
>
> ----- Original Message -----
>> From: "wodel youchi" <wodel.youchi at gmail.com>
>> To: "gluster-users" <gluster-users at gluster.org>
>> Sent: Sunday, May 24, 2015 3:31:38 AM
>> Subject: [Gluster-users] [Centos7x64] Geo-replication problem glusterfs
3.7.0-2
>>
>> Hi,
>>
>> I have two gluster servers in replicated mode as MASTERS
>> and one server for replicated geo-replication.
>>
>> I've updated my glusterfs installation to 3.7.0-2, all three
servers
>>
>> I've recreated my slave volumes
>> I've started the geo-replication, it worked for a while and now I
have some
>> problmes
>>
>> 1- Files/directories are not deleted on slave
>> 2- New files/rectories are not synced to the slave.
>>
>> I have these lines on the active master
>>
>> [2015-05-23 06:21:17.156939] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> 'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0,
'mode': 33152, 'entry':
>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',
'op': 'CREATE'}, 2)
>> [2015-05-23 06:21:17.158066] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> 'b4bffa4c-2e88-4b60-9f6a-c665c4d9f7ed', 'gid': 0,
'mode': 33152, 'entry':
>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.hdb',
'op': 'CREATE'}, 2)
>> [2015-05-23 06:21:17.159154] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> '9920cdee-6b87-4408-834b-4389f5d451fe', 'gid': 0,
'mode': 33152, 'entry':
>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.db',
'op': 'CREATE'}, 2)
>> [2015-05-23 06:21:17.160242] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> '307756d2-d924-456f-b090-10d3ff9caccb', 'gid': 0,
'mode': 33152, 'entry':
>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.ndb',
'op': 'CREATE'}, 2)
>> [2015-05-23 06:21:17.161283] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> '69ebb4cb-1157-434b-a6e9-386bea81fc1d', 'gid': 0,
'mode': 33152, 'entry':
>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/COPYING',
'op': 'CREATE'}, 2)
>> [2015-05-23 06:21:17.162368] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> '7d132fda-fc82-4ad8-8b6c-66009999650c', 'gid': 0,
'mode': 33152, 'entry':
>> '.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/daily.cld',
'op': 'CREATE'}, 2)
>> [2015-05-23 06:21:17.163718] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> 'd8a0303e-ba45-4e45-a8fd-17994c34687b', 'gid': 0,
'mode': 16832, 'entry':
>>
'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-54acc14b44e696e1cfb4a75ecc395fe0',
>> 'op': 'MKDIR'}, 2)
>> [2015-05-23 06:21:17.165102] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> '49d42bf6-3146-42bd-bc29-e704927d6133', 'gid': 0,
'mode': 16832, 'entry':
>>
'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-debec3aa6afe64bffaee8d099e76f3d4',
>> 'op': 'MKDIR'}, 2)
>> [2015-05-23 06:21:17.166147] W
[master(/mnt/brick2/brick):792:log_failures]
>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>> '1ddb93ae-3717-4347-910f-607afa67cdb0', 'gid': 0,
'mode': 33152, 'entry':
>>
'.gfid/49d42bf6-3146-42bd-bc29-e704927d6133/clamav-704a1e9a3e2c97ccac127632d7c6b8e4',
>> 'op': 'CREATE'}, 2)
>>
>>
>> in the slave lot of lines like this
>>
>> [2015-05-22 07:53:57.071999] W [fuse-bridge.c:1970:fuse_create_cbk]
>> 0-glusterfs-fuse: 25833: /.gfid/03a5a40b-c521-47ac-a4e3-916a6df42689
=> -1
>> (Operation not permitted)
>>
>>
>> in the active master I have 3.7 GB of XSYNC-CHANGELOG.xxxxxxx files in
>>
/var/lib/misc/glusterfsd/data2/ssh%3A%2F%2Froot%4010.10.10.10%3Agluster%3A%2F%2F127.0.0.1%3Aslavedata2/e55761a256af4acfe9b4a419be62462a/xsync
>>
>> I don't know if this is normal.
>>
>> any idea?
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

wodel youchi

2015-May-25 12:25 UTC

head link

[Gluster-users] [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

Hi, and thanks for your replies.

For Kotresh : No, I am not using tar ssh for my geo-replication.

For Aravinda: I had to recreate my slave volume all over et restart the
geo-replication.

If I have thousands of files with this problem, do I have to execute the
fix for all of them? is there an easy way?
Can checkpoints help me in this situation?
and more important, what can cause this problem?

I am syncing containers, they contain lot of files small files, using tar
ssh, would it be more suitable?


PS: I tried to execute this command on the Master

bash generate-gfid-file.sh localhost:data2   $PWD/get-gfid.sh
/tmp/master_gfid_file.txt

but I got errors with files that have blank (space) in their names,
for example: Admin Guide.pdf

the script sees two files Admin and Guide.pdf, then the get-gfid.sh
returns errors "no such file or directory"

thanks.


2015-05-25 7:00 GMT+01:00 Aravinda <avishwan at redhat.com>:
> Looks like this is GFID conflict issue not the tarssh issue.
>
> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
> 'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0,
'mode': 33152, 'entry':
> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',
'op': 'CREATE'}, 2)
>
>     Data: {'uid': 0,
>            'gfid': 'e529a399-756d-4cb1-9779-0af2822a0d94',
>            'gid': 0,
>            'mode': 33152,
>            'entry':
'.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',
>            'op': 'CREATE'}
>
>     and Error: 2
>
> During creation of "main.mdb" RPC failed with error number 2, ie,
ENOENT.
> This error comes when parent directory not exists or exists with different
> GFID.
> In this case Parent GFID "874799ef-df75-437b-bc8f-3fcd58b54789"
does not
> exists on slave.
>
>
> To fix the issue,
> -----------------
> Find the parent directory of "main.mdb",
> Get the GFID of that directory, using getfattr
> Check the GFID of the same directory in Slave(To confirm GFIDs are
> different)
> To fix the issue, Delete that directory in Slave.
> Set virtual xattr for that directory and all the files inside that
> directory.
>     setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <DIR>
>     setfattr -n glusterfs.geo-rep.trigger-sync -v "1"
<file-path>
>
>
> Geo-rep will recreate the directory with Proper GFID and starts sync.
>
> Let us know if you need any help.
>
> --
> regards
> Aravinda
>
>
>
>
> On 05/25/2015 10:54 AM, Kotresh Hiremath Ravishankar wrote:
>
>> Hi Wodel,
>>
>> Is the sync mode, tar over ssh (i.e., config use_tarssh is true) ?
>> If yes, there is known issue with it and patch is already up in master.
>>
>> But it can be resolved in either of the two ways.
>>
>> 1. If sync mode required is tar over ssh, just disable sync_xattrs
which
>> is true
>>     by default.
>>
>>      gluster vol geo-rep <master-vol>
<slave-host>::<slave-vol> config
>> sync_xattrs false
>>
>> 2. If sync mode is ok to be changed to rsync. Please do.
>>           gluster vol geo-rep <master-vol>
<slave-host>::<slave-vol>
>> use_tarssh false
>>
>> NOTE: rsync supports syncing of acls and xattrs where as tar over ssh
>> does not.
>>        In 3.7.0-2, tar over ssh should be used with sync_xattrs to
false
>>
>> Hope this helps.
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>> ----- Original Message -----
>>
>>> From: "wodel youchi" <wodel.youchi at gmail.com>
>>> To: "gluster-users" <gluster-users at gluster.org>
>>> Sent: Sunday, May 24, 2015 3:31:38 AM
>>> Subject: [Gluster-users] [Centos7x64] Geo-replication problem
glusterfs
>>> 3.7.0-2
>>>
>>> Hi,
>>>
>>> I have two gluster servers in replicated mode as MASTERS
>>> and one server for replicated geo-replication.
>>>
>>> I've updated my glusterfs installation to 3.7.0-2, all three
servers
>>>
>>> I've recreated my slave volumes
>>> I've started the geo-replication, it worked for a while and now
I have
>>> some
>>> problmes
>>>
>>> 1- Files/directories are not deleted on slave
>>> 2- New files/rectories are not synced to the slave.
>>>
>>> I have these lines on the active master
>>>
>>> [2015-05-23 06:21:17.156939] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> 'e529a399-756d-4cb1-9779-0af2822a0d94', 'gid': 0,
'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.mdb',
'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.158066] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> 'b4bffa4c-2e88-4b60-9f6a-c665c4d9f7ed', 'gid': 0,
'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.hdb',
'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.159154] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '9920cdee-6b87-4408-834b-4389f5d451fe', 'gid': 0,
'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.db',
'op': 'CREATE'}, 2)
>>> [2015-05-23 06:21:17.160242] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '307756d2-d924-456f-b090-10d3ff9caccb', 'gid': 0,
'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/main.ndb',
'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.161283] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '69ebb4cb-1157-434b-a6e9-386bea81fc1d', 'gid': 0,
'mode': 33152, 'entry':
>>> '.gfid/874799ef-df75-437b-bc8f-3fcd58b54789/COPYING',
'op': 'CREATE'}, 2)
>>> [2015-05-23 06:21:17.162368] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '7d132fda-fc82-4ad8-8b6c-66009999650c', 'gid': 0,
'mode': 33152, 'entry':
>>> '.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/daily.cld',
'op': 'CREATE'},
>>> 2)
>>> [2015-05-23 06:21:17.163718] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> 'd8a0303e-ba45-4e45-a8fd-17994c34687b', 'gid': 0,
'mode': 16832, 'entry':
>>>
>>>
'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-54acc14b44e696e1cfb4a75ecc395fe0',
>>> 'op': 'MKDIR'}, 2)
>>> [2015-05-23 06:21:17.165102] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '49d42bf6-3146-42bd-bc29-e704927d6133', 'gid': 0,
'mode': 16832, 'entry':
>>>
>>>
'.gfid/f6f2582e-0c5c-4cba-943a-6d5f64baf340/clamav-debec3aa6afe64bffaee8d099e76f3d4',
>>> 'op': 'MKDIR'}, 2)
>>> [2015-05-23 06:21:17.166147] W
>>> [master(/mnt/brick2/brick):792:log_failures]
>>> _GMaster: ENTRY FAILED: ({'uid': 0, 'gfid':
>>> '1ddb93ae-3717-4347-910f-607afa67cdb0', 'gid': 0,
'mode': 33152, 'entry':
>>>
>>>
'.gfid/49d42bf6-3146-42bd-bc29-e704927d6133/clamav-704a1e9a3e2c97ccac127632d7c6b8e4',
>>> 'op': 'CREATE'}, 2)
>>>
>>>
>>> in the slave lot of lines like this
>>>
>>> [2015-05-22 07:53:57.071999] W [fuse-bridge.c:1970:fuse_create_cbk]
>>> 0-glusterfs-fuse: 25833:
/.gfid/03a5a40b-c521-47ac-a4e3-916a6df42689 =>
>>> -1
>>> (Operation not permitted)
>>>
>>>
>>> in the active master I have 3.7 GB of XSYNC-CHANGELOG.xxxxxxx files
in
>>>
>>>
/var/lib/misc/glusterfsd/data2/ssh%3A%2F%2Froot%4010.10.10.10%3Agluster%3A%2F%2F127.0.0.1%3Aslavedata2/e55761a256af4acfe9b4a419be62462a/xsync
>>>
>>> I don't know if this is normal.
>>>
>>> any idea?
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150525/b827e47e/attachment.html>

Gluster users - May 2015 - [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

[Gluster-users] [Centos7x64] Geo-replication problem glusterfs 3.7.0-2

[Gluster-users] [Centos7x64] Geo-replication problem glusterfs 3.7.0-2