thr3ads.net - Gluster users - [Gluster-users] Geo-replication broken in 3.4 alpha2? [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Samuli Heinonen

2013-Mar-20 18:58 UTC

[Gluster-users] Geo-replication broken in 3.4 alpha2?

Dear all,

I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is solely a
test system and it doesn't have much data or anything important in it.
Currently it has only 2 VM's running and disk usage is around 15 GB. I have
been trying to set up a geo-replication for disaster recovery testing. For
geo-replication I did following:

All machines are running CentOS 6.4 and using GlusterFS packages from
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/.
Gluster bricks are using XFS. On slave I have tried ext4 and btrfs.

1. Installed slave machine (VM hosted in separate environment) with
glusterfs-geo-replication, rsync and some other packages as needed by
dependencies.
2. Installed glusterfs-geo-replication and rsync packages on GlusterFS server.
3. Created ssh key on server, saved it to
/var/lib/glusterd/geo-replication/secret.pem and copied it to slave
/root/.ssh/authorized_keys
4. On server ran: 
- gluster volume geo-replication vmstorage slave:/backup/vmstorage config
remote_gsyncd /usr/libexec/glusterfs/gsyncd
- gluster volume geo-replication vmstorage slave:/backup/vmstorage start

After that geo-replication status was "starting?" for a while and
after that it switched to "N/A". I set log-level to DEBUG and saw
lines like these  appearing every 10 seconds:
[2013-03-20 18:48:19.417107] D [repce:175:push] RepceClient: call
27756:140178941277952:1363798099.42 keep_alive(None,) ...
[2013-03-20 18:48:19.418431] D [repce:190:__call__] RepceClient: call
27756:140178941277952:1363798099.42 keep_alive -> 34
[2013-03-20 18:48:29.427959] D [repce:175:push] RepceClient: call
27756:140178941277952:1363798109.43 keep_alive(None,) ...
[2013-03-20 18:48:29.429172] D [repce:190:__call__] RepceClient: call
27756:140178941277952:1363798109.43 keep_alive -> 35

I thought that maybe it's creating index or something like that let it run
for about 30 hours. Still after that there was no new log messages and no data
being transferred to slave. I tried using strace -p 27756 to see what was going
on but there was no output at all. My next thought was that maybe running
virtual machines are causing some trouble so I shut down all VMs and restarted
geo-replication but it didn't have any effect. My last effort was to crete
new clean volume without any data in it and try geo-replication with it - no
luck there either.

I also did quick test with master running GlusterFS 3.3.1 and it had no problems
copying data to exactly same slave server.

There isn't much documentation available about geo-replication and before
filing a bug report I'd like to hear if anyone else has used geo-replication
successfully with 3.4 alpha orif I'm missing something obvious.

Output of gluster volume info:
Volume Name: vmstorage
Type: Distributed-Replicate
Volume ID: a800e5b7-089e-4b55-9515-c9cc72502aea
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: mc1.ovirt.local:/gluster/brick0/vmstorage
Brick2: mc5.ovirt.local:/gluster/brick0/vmstorage
Brick3: mc1.ovirt.local:/gluster/brick1/vmstorage
Brick4: mc5.ovirt.local:/gluster/brick1/vmstorage
Options Reconfigured:
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
network.remote-dio: enable
geo-replication.indexing: on
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 10
nfs.disable: on

Best regards,
Samuli Heinonen

Venky Shankar

2013-Mar-21 14:48 UTC

head link

[Gluster-users] Geo-replication broken in 3.4 alpha2?

Looks like you may be running into this bug: 
https://bugzilla.redhat.com/show_bug.cgi?id=905871

can you gdb to the client process (on the master) and give the backtrace?

-venky

On Thursday 21 March 2013 12:28 AM, Samuli Heinonen
wrote:> Dear all,
>
> I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is
solely a test system and it doesn't have much data or anything important in
it. Currently it has only 2 VM's running and disk usage is around 15 GB. I
have been trying to set up a geo-replication for disaster recovery testing. For
geo-replication I did following:
>
> All machines are running CentOS 6.4 and using GlusterFS packages from
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/.
Gluster bricks are using XFS. On slave I have tried ext4 and btrfs.
>
> 1. Installed slave machine (VM hosted in separate environment) with
glusterfs-geo-replication, rsync and some other packages as needed by
dependencies.
> 2. Installed glusterfs-geo-replication and rsync packages on GlusterFS
server.
> 3. Created ssh key on server, saved it to
/var/lib/glusterd/geo-replication/secret.pem and copied it to slave
/root/.ssh/authorized_keys
> 4. On server ran:
> - gluster volume geo-replication vmstorage slave:/backup/vmstorage config
remote_gsyncd /usr/libexec/glusterfs/gsyncd
> - gluster volume geo-replication vmstorage slave:/backup/vmstorage start
>
> After that geo-replication status was "starting?" for a while and
after that it switched to "N/A". I set log-level to DEBUG and saw
lines like these  appearing every 10 seconds:
> [2013-03-20 18:48:19.417107] D [repce:175:push] RepceClient: call
27756:140178941277952:1363798099.42 keep_alive(None,) ...
> [2013-03-20 18:48:19.418431] D [repce:190:__call__] RepceClient: call
27756:140178941277952:1363798099.42 keep_alive -> 34
> [2013-03-20 18:48:29.427959] D [repce:175:push] RepceClient: call
27756:140178941277952:1363798109.43 keep_alive(None,) ...
> [2013-03-20 18:48:29.429172] D [repce:190:__call__] RepceClient: call
27756:140178941277952:1363798109.43 keep_alive -> 35
>
> I thought that maybe it's creating index or something like that let it
run for about 30 hours. Still after that there was no new log messages and no
data being transferred to slave. I tried using strace -p 27756 to see what was
going on but there was no output at all. My next thought was that maybe running
virtual machines are causing some trouble so I shut down all VMs and restarted
geo-replication but it didn't have any effect. My last effort was to crete
new clean volume without any data in it and try geo-replication with it - no
luck there either.
>
> I also did quick test with master running GlusterFS 3.3.1 and it had no
problems copying data to exactly same slave server.
>
> There isn't much documentation available about geo-replication and
before filing a bug report I'd like to hear if anyone else has used
geo-replication successfully with 3.4 alpha orif I'm missing something
obvious.
>
> Output of gluster volume info:
> Volume Name: vmstorage
> Type: Distributed-Replicate
> Volume ID: a800e5b7-089e-4b55-9515-c9cc72502aea
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: mc1.ovirt.local:/gluster/brick0/vmstorage
> Brick2: mc5.ovirt.local:/gluster/brick0/vmstorage
> Brick3: mc1.ovirt.local:/gluster/brick1/vmstorage
> Brick4: mc5.ovirt.local:/gluster/brick1/vmstorage
> Options Reconfigured:
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> network.remote-dio: enable
> geo-replication.indexing: on
> storage.owner-uid: 36
> storage.owner-gid: 36
> network.ping-timeout: 10
> nfs.disable: on
>
> Best regards,
> Samuli Heinonen
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130321/fc623e0c/attachment.html>

Csaba Henk

2013-Mar-21 16:43 UTC

head link

[Gluster-users] Geo-replication broken in 3.4 alpha2?

Hi Samuli,

ddOn 2013-03-20, Samuli Heinonen <samppah at neutraali.net> wrote:
>
> Dear all,
> 
> I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is
solely a test system and it doesn't have
> much data or anything important in it. Currently it has only 2 VM's
running and disk usage is around 15 GB. I
> have been trying to set up a geo-replication for disaster recovery testing.
For geo-replication I did following:
> 
> All machines are running CentOS 6.4 and using GlusterFS packages from
>
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/.
> Gluster bricks are using XFS. On slave I have tried ext4 and btrfs.
> 
> 1. Installed slave machine (VM hosted in separate environment) with
glusterfs-geo-replication, rsync
> and some other packages as needed by dependencies.
> 2. Installed glusterfs-geo-replication and rsync packages on GlusterFS
server.
> 3. Created ssh key on server, saved it to
/var/lib/glusterd/geo-replication/secret.pem and copied it to
> slave /root/.ssh/authorized_keys
> 4. On server ran: 
> - gluster volume geo-replication vmstorage slave:/backup/vmstorage config
remote_gsyncd /usr/libexec/glusterfs/gsyncd
> - gluster volume geo-replication vmstorage slave:/backup/vmstorage start
[...]
> 
> I thought that maybe it's creating index or something like that let it
run for about 30 hours. Still after
> that there was no new log messages and no data being transferred to slave.
I tried using strace -p 27756 to
> see what was going on but there was no output at all. My next thought was
that maybe running virtual machines
> are causing some trouble so I shut down all VMs and restarted
geo-replication but it didn't have any
> effect. My last effort was to crete new clean volume without any data in it
and try geo-replication with it -
> no luck there either.
This behavior is confirmed -- it's exactly reproducible.

I'll try to get back to you tomorrow with an update. If that won't
happen (because not getting any cleverer...)
then I can chime back only after 4th of April, I'll be on leave.

Regards,
Csaba

Possibly Parallel Threads

Search for more possibly parallel threads

Gluster users - Mar 2013 - Geo-replication broken in 3.4 alpha2?

[Gluster-users] Geo-replication broken in 3.4 alpha2?

[Gluster-users] Geo-replication broken in 3.4 alpha2?

[Gluster-users] Geo-replication broken in 3.4 alpha2?

Possibly Parallel Threads