Samuli Heinonen
2013-Mar-20 18:58 UTC
[Gluster-users] Geo-replication broken in 3.4 alpha2?
Dear all, I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is solely a test system and it doesn't have much data or anything important in it. Currently it has only 2 VM's running and disk usage is around 15 GB. I have been trying to set up a geo-replication for disaster recovery testing. For geo-replication I did following: All machines are running CentOS 6.4 and using GlusterFS packages from http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/. Gluster bricks are using XFS. On slave I have tried ext4 and btrfs. 1. Installed slave machine (VM hosted in separate environment) with glusterfs-geo-replication, rsync and some other packages as needed by dependencies. 2. Installed glusterfs-geo-replication and rsync packages on GlusterFS server. 3. Created ssh key on server, saved it to /var/lib/glusterd/geo-replication/secret.pem and copied it to slave /root/.ssh/authorized_keys 4. On server ran: - gluster volume geo-replication vmstorage slave:/backup/vmstorage config remote_gsyncd /usr/libexec/glusterfs/gsyncd - gluster volume geo-replication vmstorage slave:/backup/vmstorage start After that geo-replication status was "starting?" for a while and after that it switched to "N/A". I set log-level to DEBUG and saw lines like these appearing every 10 seconds: [2013-03-20 18:48:19.417107] D [repce:175:push] RepceClient: call 27756:140178941277952:1363798099.42 keep_alive(None,) ... [2013-03-20 18:48:19.418431] D [repce:190:__call__] RepceClient: call 27756:140178941277952:1363798099.42 keep_alive -> 34 [2013-03-20 18:48:29.427959] D [repce:175:push] RepceClient: call 27756:140178941277952:1363798109.43 keep_alive(None,) ... [2013-03-20 18:48:29.429172] D [repce:190:__call__] RepceClient: call 27756:140178941277952:1363798109.43 keep_alive -> 35 I thought that maybe it's creating index or something like that let it run for about 30 hours. Still after that there was no new log messages and no data being transferred to slave. I tried using strace -p 27756 to see what was going on but there was no output at all. My next thought was that maybe running virtual machines are causing some trouble so I shut down all VMs and restarted geo-replication but it didn't have any effect. My last effort was to crete new clean volume without any data in it and try geo-replication with it - no luck there either. I also did quick test with master running GlusterFS 3.3.1 and it had no problems copying data to exactly same slave server. There isn't much documentation available about geo-replication and before filing a bug report I'd like to hear if anyone else has used geo-replication successfully with 3.4 alpha orif I'm missing something obvious. Output of gluster volume info: Volume Name: vmstorage Type: Distributed-Replicate Volume ID: a800e5b7-089e-4b55-9515-c9cc72502aea Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: mc1.ovirt.local:/gluster/brick0/vmstorage Brick2: mc5.ovirt.local:/gluster/brick0/vmstorage Brick3: mc1.ovirt.local:/gluster/brick1/vmstorage Brick4: mc5.ovirt.local:/gluster/brick1/vmstorage Options Reconfigured: performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off network.remote-dio: enable geo-replication.indexing: on storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 10 nfs.disable: on Best regards, Samuli Heinonen
Looks like you may be running into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=905871 can you gdb to the client process (on the master) and give the backtrace? -venky On Thursday 21 March 2013 12:28 AM, Samuli Heinonen wrote:> Dear all, > > I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is solely a test system and it doesn't have much data or anything important in it. Currently it has only 2 VM's running and disk usage is around 15 GB. I have been trying to set up a geo-replication for disaster recovery testing. For geo-replication I did following: > > All machines are running CentOS 6.4 and using GlusterFS packages from http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/. Gluster bricks are using XFS. On slave I have tried ext4 and btrfs. > > 1. Installed slave machine (VM hosted in separate environment) with glusterfs-geo-replication, rsync and some other packages as needed by dependencies. > 2. Installed glusterfs-geo-replication and rsync packages on GlusterFS server. > 3. Created ssh key on server, saved it to /var/lib/glusterd/geo-replication/secret.pem and copied it to slave /root/.ssh/authorized_keys > 4. On server ran: > - gluster volume geo-replication vmstorage slave:/backup/vmstorage config remote_gsyncd /usr/libexec/glusterfs/gsyncd > - gluster volume geo-replication vmstorage slave:/backup/vmstorage start > > After that geo-replication status was "starting?" for a while and after that it switched to "N/A". I set log-level to DEBUG and saw lines like these appearing every 10 seconds: > [2013-03-20 18:48:19.417107] D [repce:175:push] RepceClient: call 27756:140178941277952:1363798099.42 keep_alive(None,) ... > [2013-03-20 18:48:19.418431] D [repce:190:__call__] RepceClient: call 27756:140178941277952:1363798099.42 keep_alive -> 34 > [2013-03-20 18:48:29.427959] D [repce:175:push] RepceClient: call 27756:140178941277952:1363798109.43 keep_alive(None,) ... > [2013-03-20 18:48:29.429172] D [repce:190:__call__] RepceClient: call 27756:140178941277952:1363798109.43 keep_alive -> 35 > > I thought that maybe it's creating index or something like that let it run for about 30 hours. Still after that there was no new log messages and no data being transferred to slave. I tried using strace -p 27756 to see what was going on but there was no output at all. My next thought was that maybe running virtual machines are causing some trouble so I shut down all VMs and restarted geo-replication but it didn't have any effect. My last effort was to crete new clean volume without any data in it and try geo-replication with it - no luck there either. > > I also did quick test with master running GlusterFS 3.3.1 and it had no problems copying data to exactly same slave server. > > There isn't much documentation available about geo-replication and before filing a bug report I'd like to hear if anyone else has used geo-replication successfully with 3.4 alpha orif I'm missing something obvious. > > Output of gluster volume info: > Volume Name: vmstorage > Type: Distributed-Replicate > Volume ID: a800e5b7-089e-4b55-9515-c9cc72502aea > Status: Started > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: mc1.ovirt.local:/gluster/brick0/vmstorage > Brick2: mc5.ovirt.local:/gluster/brick0/vmstorage > Brick3: mc1.ovirt.local:/gluster/brick1/vmstorage > Brick4: mc5.ovirt.local:/gluster/brick1/vmstorage > Options Reconfigured: > performance.stat-prefetch: off > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > network.remote-dio: enable > geo-replication.indexing: on > storage.owner-uid: 36 > storage.owner-gid: 36 > network.ping-timeout: 10 > nfs.disable: on > > Best regards, > Samuli Heinonen > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130321/fc623e0c/attachment.html>
Hi Samuli, ddOn 2013-03-20, Samuli Heinonen <samppah at neutraali.net> wrote:> > Dear all, > > I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is solely a test system and it doesn't have > much data or anything important in it. Currently it has only 2 VM's running and disk usage is around 15 GB. I > have been trying to set up a geo-replication for disaster recovery testing. For geo-replication I did following: > > All machines are running CentOS 6.4 and using GlusterFS packages from > http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/. > Gluster bricks are using XFS. On slave I have tried ext4 and btrfs. > > 1. Installed slave machine (VM hosted in separate environment) with glusterfs-geo-replication, rsync > and some other packages as needed by dependencies. > 2. Installed glusterfs-geo-replication and rsync packages on GlusterFS server. > 3. Created ssh key on server, saved it to /var/lib/glusterd/geo-replication/secret.pem and copied it to > slave /root/.ssh/authorized_keys > 4. On server ran: > - gluster volume geo-replication vmstorage slave:/backup/vmstorage config remote_gsyncd /usr/libexec/glusterfs/gsyncd > - gluster volume geo-replication vmstorage slave:/backup/vmstorage start[...]> > I thought that maybe it's creating index or something like that let it run for about 30 hours. Still after > that there was no new log messages and no data being transferred to slave. I tried using strace -p 27756 to > see what was going on but there was no output at all. My next thought was that maybe running virtual machines > are causing some trouble so I shut down all VMs and restarted geo-replication but it didn't have any > effect. My last effort was to crete new clean volume without any data in it and try geo-replication with it - > no luck there either.This behavior is confirmed -- it's exactly reproducible. I'll try to get back to you tomorrow with an update. If that won't happen (because not getting any cleverer...) then I can chime back only after 4th of April, I'll be on leave. Regards, Csaba