Tony Maro
2013-Jul-08 18:04 UTC
[Gluster-users] Possible to preload data on a georeplication target? First sync taking forever...
I have about 4 TB of data in a Gluster mirror configuration on top of ZFS, mostly consisting of 20KB files. I've added a georeplication target and the sync started ok. The target is using an SSH destination. It ran pretty quick for a while but it's taken over 2 weeks to sync just under 1 TB of data to the target server and it appears to be getting slower. The two servers are connected to the same switch on a private segment with Gigabit ethernet, so the bottleneck is not the network. I haven't physically moved the georeplication target to the other end of the WAN yet. I really don't want to wait another 6 weeks (or worse) to get my starting full sync done before sending the server out. Is it possible to manually rsync the data over myself to get it's starting position? If so, what steps should I take? In other words, break replication, delete index, are the special rsync flags I should use if I rsync the data over myself, etc.? For reference before anyone asks, the source brick that's running the georeplication is reporting the following: top - 14:01:55 up 3:55, 1 user, load average: 0.31, 0.74, 0.85 Tasks: 221 total, 1 running, 220 sleeping, 0 stopped, 0 zombie Cpu(s): 3.6%us, 2.9%sy, 0.0%ni, 83.2%id, 10.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 12297148k total, 12147752k used, 149396k free, 11684k buffers Swap: 93180k total, 0k used, 93180k free, 3201636k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1711 root 20 0 835m 28m 2484 S 155 0.2 38:25.90 glusterfsd CPU usage for glusterfsd bounces between around 20% and 160%. Thanks, Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130708/80d04cf3/attachment.html>
Jan Hudoba
2013-Jul-19 16:52 UTC
[Gluster-users] Possible to preload data on a georeplication target? First sync taking forever...
hi, i dont know if you can preload data (if can, i want to know it too). but maybe you can try to set this setting and then stop and start geo-repl. it can be faster. gluster volume geo-replication $VOL ssh://$USER@$GEO_HOST::$GEO_VOL config sync-jobs 4 i suggest number based on cpu cores/raid disks on slave. beware it will make same linux load on slave. -- S pozdravom / Yours sincerely Ing. Jan Hudoba http://www.facebook.com/jan.hudoba http://www.jahu.sk