thr3ads.net - Gluster users - [Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2 [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Kingsley

2014-Sep-26 23:15 UTC

[Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2

Hi,

I'm new to gluster so forgive me if I'm being an idiot. I've
searched
the list archives back to May but haven't found the exact issue I've
come across, so I thought I'd ask on here.

Firstly, I'd like to thank the people working on this project. I've
found gluster to be pretty simple to get going and it seems to work
pretty well so far. It looks like it will be a good fit for the
application I have in mind, if we can get geo-replication to work
reliably.

Now on to my problem ...

I've set up an additional gluster volume and configured geo-replication
to replicate the master volume to it using the instructions here:

https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md

To keep things simple while it was all new to me and I was just testing,
I didn't want to add confusion by thinking about using non-privileged
accounts and mountbroker and stuff so I just set everything up to use
root.

Anyway, I mounted the master volume and slave on a client machine (I
didn't modify the content of the slave volume, I just mounted it so that
I could check things were working).

When I manually create or delete a few files and wait 60 seconds for
replication to do its thing, it seems to work fine.

However, when I hit it with a script to simulate intense user activity,
geo-replication breaks. I deleted the geo-replication and removed the
slave volume, then re-created and re-enabled geo-replication several
times so that I could start again from scratch. Each time, my script
(which just creates, renames and deletes files in the master volume via
a glusterfs mount) runs for barely a minute before geo-replication
breaks.

I tried this with the slave volume containing just one brick, and also
with it containing 2 bricks replicating each other. Each time, it broke.

On the slave, I noticed that the geo-replication logs contained entries
like these:

[2014-09-26 16:32:23.995539] W [fuse-bridge.c:1214:fuse_err_cbk]
0-glusterfs-fuse: 6384: SETXATTR() /.gfid/5f9b6d20-a062-4168-9333-8d28f2ba2d57
=> -1 (File exists)
[2014-09-26 16:32:23.995798] W [client-rpc-fops.c:256:client3_3_mknod_cbk]
0-gv2-slave-client-0: remote operation failed: File exists. Path:
<gfid:855b5eda-f694-487c-adae-a4723fe6c316>/msg000002
[2014-09-26 16:32:23.996042] W [fuse-bridge.c:1214:fuse_err_cbk]
0-glusterfs-fuse: 6385: SETXATTR() /.gfid/855b5eda-f694-487c-adae-a4723fe6c316
=> -1 (File exists)
[2014-09-26 16:32:24.550009] W [fuse-bridge.c:1911:fuse_create_cbk]
0-glusterfs-fuse: 6469: /.gfid/05a27020-5931-4890-9b74-a77cb1aca918 => -1
(Operation not permitted)
[2014-09-26 16:32:24.550533] W [defaults.c:1381:default_release]
(-->/usr/lib64/glusterfs/3.5.2/xlator/mount/fuse.so(+0x1e7d0)
[0x7fb2ebd1e7d0]
(-->/usr/lib64/glusterfs/3.5.2/xlator/mount/fuse.so(free_fuse_state+0x93)
[0x7fb2ebd07063] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x10e)
[0x7fb2eef36fbe]))) 0-fuse: xlator does not implement release_cbk

I also noticed that at some point, rsync was returning error code 23.

Now ... I noted from the page I linked above that it requires rsync
version 3.0.7 and the version that ships with CentOS 6.5 is, wait for
it ... 3.0.6. Is this going to be the issue, or is the problem something
else?

If you need more logs, let me know. If you need a copy of my client
script that breaks it, let me know that and I'll send it along.

-- 
Cheers,
Kingsley.

James Payne

2014-Sep-27 22:48 UTC

head link

[Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2

Not sure but is this the same as bug
https://bugzilla.redhat.com/show_bug.cgi?id=1141379

I have seen similar behaviour but in my case it was shown up due to using
Samba and every time a user created a folder (Windows calls it New Folder)
and renamed it quickly the Geo Rep version became instantly incorrect.

James


-----Original Message-----
From: Kingsley [mailto:gluster at gluster.dogwind.com] 
Sent: 27 September 2014 00:16
To: gluster-users at gluster.org
Subject: [Gluster-users] geo-replication fails on CentOS 6.5, gluster v
3.5.2

Hi,

I'm new to gluster so forgive me if I'm being an idiot. I've
searched the
list archives back to May but haven't found the exact issue I've come
across, so I thought I'd ask on here.

Firstly, I'd like to thank the people working on this project. I've
found
gluster to be pretty simple to get going and it seems to work pretty well so
far. It looks like it will be a good fit for the application I have in mind,
if we can get geo-replication to work reliably.

Now on to my problem ...

I've set up an additional gluster volume and configured geo-replication to
replicate the master volume to it using the instructions here:

https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markd
own/admin_distributed_geo_rep.md

To keep things simple while it was all new to me and I was just testing, I
didn't want to add confusion by thinking about using non-privileged accounts
and mountbroker and stuff so I just set everything up to use root.

Anyway, I mounted the master volume and slave on a client machine (I didn't
modify the content of the slave volume, I just mounted it so that I could
check things were working).

When I manually create or delete a few files and wait 60 seconds for
replication to do its thing, it seems to work fine.

However, when I hit it with a script to simulate intense user activity,
geo-replication breaks. I deleted the geo-replication and removed the slave
volume, then re-created and re-enabled geo-replication several times so that
I could start again from scratch. Each time, my script (which just creates,
renames and deletes files in the master volume via a glusterfs mount) runs
for barely a minute before geo-replication breaks.

I tried this with the slave volume containing just one brick, and also with
it containing 2 bricks replicating each other. Each time, it broke.

On the slave, I noticed that the geo-replication logs contained entries like
these:

[2014-09-26 16:32:23.995539] W [fuse-bridge.c:1214:fuse_err_cbk]
0-glusterfs-fuse: 6384: SETXATTR()
/.gfid/5f9b6d20-a062-4168-9333-8d28f2ba2d57 => -1 (File exists)
[2014-09-26 16:32:23.995798] W [client-rpc-fops.c:256:client3_3_mknod_cbk]
0-gv2-slave-client-0: remote operation failed: File exists. Path:
<gfid:855b5eda-f694-487c-adae-a4723fe6c316>/msg000002
[2014-09-26 16:32:23.996042] W [fuse-bridge.c:1214:fuse_err_cbk]
0-glusterfs-fuse: 6385: SETXATTR()
/.gfid/855b5eda-f694-487c-adae-a4723fe6c316 => -1 (File exists)
[2014-09-26 16:32:24.550009] W [fuse-bridge.c:1911:fuse_create_cbk]
0-glusterfs-fuse: 6469: /.gfid/05a27020-5931-4890-9b74-a77cb1aca918 => -1
(Operation not permitted)
[2014-09-26 16:32:24.550533] W [defaults.c:1381:default_release]
(-->/usr/lib64/glusterfs/3.5.2/xlator/mount/fuse.so(+0x1e7d0)
[0x7fb2ebd1e7d0]
(-->/usr/lib64/glusterfs/3.5.2/xlator/mount/fuse.so(free_fuse_state+0x93)
[0x7fb2ebd07063] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x10e)
[0x7fb2eef36fbe]))) 0-fuse: xlator does not implement release_cbk

I also noticed that at some point, rsync was returning error code 23.

Now ... I noted from the page I linked above that it requires rsync version
3.0.7 and the version that ships with CentOS 6.5 is, wait for it ... 3.0.6.
Is this going to be the issue, or is the problem something else?

If you need more logs, let me know. If you need a copy of my client script
that breaks it, let me know that and I'll send it along.

--
Cheers,
Kingsley.

Aravinda

2014-Sep-29 08:20 UTC

head link

[Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2

On 09/27/2014 04:45 AM, Kingsley wrote:> Hi,
>
> I'm new to gluster so forgive me if I'm being an idiot. I've
searched
> the list archives back to May but haven't found the exact issue
I've
> come across, so I thought I'd ask on here.
>
> Firstly, I'd like to thank the people working on this project. I've
> found gluster to be pretty simple to get going and it seems to work
> pretty well so far. It looks like it will be a good fit for the
> application I have in mind, if we can get geo-replication to work
> reliably.
>
> Now on to my problem ...
>
> I've set up an additional gluster volume and configured geo-replication
> to replicate the master volume to it using the instructions here:
>
>
https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md
>
> To keep things simple while it was all new to me and I was just testing,
> I didn't want to add confusion by thinking about using non-privileged
> accounts and mountbroker and stuff so I just set everything up to use
> root.
>
> Anyway, I mounted the master volume and slave on a client machine (I
> didn't modify the content of the slave volume, I just mounted it so
that
> I could check things were working).
>
> When I manually create or delete a few files and wait 60 seconds for
> replication to do its thing, it seems to work fine.
>
> However, when I hit it with a script to simulate intense user activity,
> geo-replication breaks. I deleted the geo-replication and removed the
> slave volume, then re-created and re-enabled geo-replication several
> times so that I could start again from scratch. Each time, my script
> (which just creates, renames and deletes files in the master volume via
> a glusterfs mount) runs for barely a minute before geo-replication
> breaks.Are these fop involves renames and delete of the same files? Geo-rep had 
issue with short lived renamed files(Now fixed in Master 
http://review.gluster.org/#/c/8761/).>
> I tried this with the slave volume containing just one brick, and also
> with it containing 2 bricks replicating each other. Each time, it broke.
>
> On the slave, I noticed that the geo-replication logs contained entries
> like these:
>
> [2014-09-26 16:32:23.995539] W [fuse-bridge.c:1214:fuse_err_cbk]
0-glusterfs-fuse: 6384: SETXATTR() /.gfid/5f9b6d20-a062-4168-9333-8d28f2ba2d57
=> -1 (File exists)
> [2014-09-26 16:32:23.995798] W [client-rpc-fops.c:256:client3_3_mknod_cbk]
0-gv2-slave-client-0: remote operation failed: File exists. Path:
<gfid:855b5eda-f694-487c-adae-a4723fe6c316>/msg000002
> [2014-09-26 16:32:23.996042] W [fuse-bridge.c:1214:fuse_err_cbk]
0-glusterfs-fuse: 6385: SETXATTR() /.gfid/855b5eda-f694-487c-adae-a4723fe6c316
=> -1 (File exists)
> [2014-09-26 16:32:24.550009] W [fuse-bridge.c:1911:fuse_create_cbk]
0-glusterfs-fuse: 6469: /.gfid/05a27020-5931-4890-9b74-a77cb1aca918 => -1
(Operation not permitted)
> [2014-09-26 16:32:24.550533] W [defaults.c:1381:default_release]
(-->/usr/lib64/glusterfs/3.5.2/xlator/mount/fuse.so(+0x1e7d0)
[0x7fb2ebd1e7d0]
(-->/usr/lib64/glusterfs/3.5.2/xlator/mount/fuse.so(free_fuse_state+0x93)
[0x7fb2ebd07063] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x10e)
[0x7fb2eef36fbe]))) 0-fuse: xlator does not implement release_cbkFile exists errors can be ignored as these are soft errors, which are 
already handled in Geo-replication.> I also noticed that at some point, rsync was returning error code 23.Above mentioned patch also handles error code 23.
>
> Now ... I noted from the page I linked above that it requires rsync
> version 3.0.7 and the version that ships with CentOS 6.5 is, wait for
> it ... 3.0.6. Is this going to be the issue, or is the problem something
> else?
No issue with rsync version.>
> If you need more logs, let me know. If you need a copy of my client
> script that breaks it, let me know that and I'll send it along.
>
--
regards
Aravinda
http://aravindavk.in

Gluster users - Sep 2014 - geo-replication fails on CentOS 6.5, gluster v 3.5.2

[Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2

[Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2

[Gluster-users] geo-replication fails on CentOS 6.5, gluster v 3.5.2