Brian Ericson
2015-Nov-04 17:30 UTC
[Gluster-users] CHANGELOGs and new geo-replica sync taking forever
tl;dr -- geo-replication of ~200,000 CHANGELOG files is killing me... Help! I have about 125G spread over just shy of 5000 files that I'm replicating with geo-replication to nodes around the world. The content is fairly stable and probably hasn't changed at all since I initially established the GlusterFS nodes/network, which looks as follows: x -> xx -> [xxx, xxy] (x geo-replicates to xx, xx geo-replicates to xxx/xxy) Latency & throughput are markedly different (x -> xx is the fastest, xx -> xxx the slowest (at about 1G/hour)). That said, all nodes were synced with 5 days of setting up the network. I have since added another node, xxz, which is also geo-replicated from xx (xx -> xxz). Its latency/throughput is clearly better than xx -> xxx's, but over 5 days later, I'm still replicating CHANGELOGs and haven't gotten to any real content (the replicated volumes' mounted filesystems are empty). Starting with x, you can see I have a "reasonable" number of CHANGELOGs: x # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l 186 However, xxz's source is xx, and I've got a real problem with xx: xx # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l 193450 5+ days into this, and I've hardly managed to dent this on xxz: xxz # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l 43211 On top of that, xx is generating new CHANGELOGs at a rate of ~6/minute (two volumes at ~3/minute each), so chasing CHANGELOGs is a (quickly) moving target. And these files are small! The "I'm alive" file is 92 bytes long, I've also seen them also average about 4k. Demonstrating latency/throughput, you can see that small files (for me) are a real killer: ### x -> xx (fastest route) # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )"; done 1 $i ); do echo -n "$file" | ssh $location 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )"; done 1 (3984k): 0m4.777s 10 (398k): 0m10.737s 100 (39k): 0m53.286s 1000 (3k): 7m21.493s ### xx -> xxx (slowest route) # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )"; done 1 (3984k): 0m11.065s 10 (398k): 0m41.007s 100 (39k): 4m52.814s 1000 (3k): 39m23.009s ### xx -> xxz (the route I've added and am trying to sync) # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )"; done 1 (3984k): 0m2.673s 10 (398k): 0m16.333s 100 (39k): 2m0.676s 1000 (3k): 17m28.265s What you're looking at is the cost of transferring a total of 4000k: 1 transfer at 4000k, 10 at 400k, 100 at 40k, and 1000 at 4k. With 1 transfer at under 3s and 1000 transfers at nearly 17 1/2 minutes for xx -> xxz and for the same total transfer size, it's really a killer to transfer CHANGELOGs, especially almost 200,000 of them. And, 92 byte files doesn't improve this: ### x -> xx (fastest route) # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( seq 1 $i ); do echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )" 100 (92): 0m34.164s ### xx -> xxx (slowest route) # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( seq 1 $i ); do echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )" 100 (92): 3m53.388s ### xx -> xxz (the route I've added and am trying to sync) # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( seq 1 $i ); do echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 }' )" 100 (92): 1m43.389s Questions...: o Why so many CHANGELOGs? o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so 43211/5/24/60=6 implies a real transfer rate of about 6 CHANGELOG files per minute, which brings me back to xx's generating new ones at about that rate...)? o What can I do to "fix" this?
Wade Fitzpatrick
2015-Nov-05 00:46 UTC
[Gluster-users] CHANGELOGs and new geo-replica sync taking forever
I also had problems getting geo-replication working correctly and eventually gave it up due to project time constraints. What version of gluster? What is the topology of x, xx, and xxx/xxy/xxz? I tried a 2x2 stripe-replica with geo-replication to a 2x1 stripe using 3.7.4. Starting replication with 32 GB of small files never completed, it failed several times. Starting replication with an empty volume then filling it with a rate limit of 2000k/s managed to keep sync until completion but could not handle the rate of change under normal usage. On 5/11/2015 3:30 AM, Brian Ericson wrote:> tl;dr -- geo-replication of ~200,000 CHANGELOG files is killing me... > Help! > > I have about 125G spread over just shy of 5000 files that I'm > replicating with > geo-replication to nodes around the world. The content is fairly > stable and > probably hasn't changed at all since I initially established the > GlusterFS > nodes/network, which looks as follows: > x -> xx -> [xxx, xxy] (x geo-replicates to xx, xx geo-replicates to > xxx/xxy) > > Latency & throughput are markedly different (x -> xx is the fastest, > xx -> xxx > the slowest (at about 1G/hour)). That said, all nodes were synced with > 5 days > of setting up the network. > > I have since added another node, xxz, which is also geo-replicated > from xx (xx > -> xxz). Its latency/throughput is clearly better than xx -> xxx's, > but over 5 > days later, I'm still replicating CHANGELOGs and haven't gotten to any > real > content (the replicated volumes' mounted filesystems are empty). > > Starting with x, you can see I have a "reasonable" number of CHANGELOGs: > x # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l > 186 > > However, xxz's source is xx, and I've got a real problem with xx: > xx # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l > 193450 > > 5+ days into this, and I've hardly managed to dent this on xxz: > xxz # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l > 43211 > > On top of that, xx is generating new CHANGELOGs at a rate of ~6/minute > (two > volumes at ~3/minute each), so chasing CHANGELOGs is a (quickly) > moving target. > > And these files are small! The "I'm alive" file is 92 bytes long, I've > also > seen them also average about 4k. Demonstrating latency/throughput, you > can see > that small files (for me) are a real killer: > ### x -> xx (fastest route) > # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 > count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | > wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n > "$file" | ssh xx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 > }' )"; done > 1 $i ); do echo -n "$file" | ssh $location 'cat > /dev/null'; done ) > |& awk '/^real/{ print $2 }' )"; done > 1 (3984k): 0m4.777s > 10 (398k): 0m10.737s > 100 (39k): 0m53.286s > 1000 (3k): 7m21.493s > > ### xx -> xxx (slowest route) > # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 > count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | > wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n > "$file" | ssh xxx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 > }' )"; done > 1 (3984k): 0m11.065s > 10 (398k): 0m41.007s > 100 (39k): 4m52.814s > 1000 (3k): 39m23.009s > > ### xx -> xxz (the route I've added and am trying to sync) > # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 > count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | > wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n > "$file" | ssh xxz 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 > }' )"; done > 1 (3984k): 0m2.673s > 10 (398k): 0m16.333s > 100 (39k): 2m0.676s > 1000 (3k): 17m28.265s > > What you're looking at is the cost of transferring a total of 4000k: 1 > transfer > at 4000k, 10 at 400k, 100 at 40k, and 1000 at 4k. With 1 transfer at under 3s > and 1000 > transfers at nearly 17 1/2 minutes for xx -> xxz and for the same total > transfer size, it's really a killer to transfer CHANGELOGs, especially > almost > 200,000 of them. > > And, 92 byte files doesn't improve this: > ### x -> xx (fastest route) > # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; > echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( > seq 1 $i ); do echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& > awk '/^real/{ print $2 }' )" > 100 (92): 0m34.164s > > ### xx -> xxx (slowest route) > # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; > echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( > seq 1 $i ); do echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& > awk '/^real/{ print $2 }' )" > 100 (92): 3m53.388s > > ### xx -> xxz (the route I've added and am trying to sync) > # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; > echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( > seq 1 $i ); do echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& > awk '/^real/{ print $2 }' )" > 100 (92): 1m43.389s > > Questions...: > o Why so many CHANGELOGs? > > o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so > 43211/5/24/60=6 > implies a real transfer rate of about 6 CHANGELOG files per minute, > which > brings me back to xx's generating new ones at about that rate...)? > > o What can I do to "fix" this? > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151105/9349fd65/attachment.html>
Aravinda
2015-Nov-05 06:16 UTC
[Gluster-users] CHANGELOGs and new geo-replica sync taking forever
Thanks for the detailed mail. Is Geo-replication status showing Faulty? Please share the output of `gluster volume geo-replication status` Looks like Geo-replication is halted due to some unrecoverable error during replication. Please share the log files from Master and Slave nodes to root cause the issue. Questions...: o Why so many CHANGELOGs? One Changelog gets generated every 15 seconds only if changes happened in that brick. o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so 43211/5/24/60=6 implies a real transfer rate of about 6 CHANGELOG files per minute, which brings me back to xx's generating new ones at about that rate...)? Changelogs are not transferred. Changelogs seen in Slave nodes are generated in Slave Volume since Changelog is enabled for Slave volume also. Changelogs are parsed in Master and Files are replicated to Slave volume in two steps 1. Entry Creation using RPC 2. Data sync using Rsync o What can I do to "fix" this? Please share the log files, we will look into the issue and help in resolving this issue. regards Aravinda On 11/04/2015 11:00 PM, Brian Ericson wrote:> tl;dr -- geo-replication of ~200,000 CHANGELOG files is killing me... > Help! > > I have about 125G spread over just shy of 5000 files that I'm > replicating with > geo-replication to nodes around the world. The content is fairly > stable and > probably hasn't changed at all since I initially established the > GlusterFS > nodes/network, which looks as follows: > x -> xx -> [xxx, xxy] (x geo-replicates to xx, xx geo-replicates to > xxx/xxy) > > Latency & throughput are markedly different (x -> xx is the fastest, > xx -> xxx > the slowest (at about 1G/hour)). That said, all nodes were synced with > 5 days > of setting up the network. > > I have since added another node, xxz, which is also geo-replicated > from xx (xx > -> xxz). Its latency/throughput is clearly better than xx -> xxx's, > but over 5 > days later, I'm still replicating CHANGELOGs and haven't gotten to any > real > content (the replicated volumes' mounted filesystems are empty). > > Starting with x, you can see I have a "reasonable" number of CHANGELOGs: > x # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l > 186 > > However, xxz's source is xx, and I've got a real problem with xx: > xx # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l > 193450 > > 5+ days into this, and I've hardly managed to dent this on xxz: > xxz # find /bricks/*/.glusterfs/changelogs -name CHANGELOG\* | wc -l > 43211 > > On top of that, xx is generating new CHANGELOGs at a rate of ~6/minute > (two > volumes at ~3/minute each), so chasing CHANGELOGs is a (quickly) > moving target. > > And these files are small! The "I'm alive" file is 92 bytes long, I've > also > seen them also average about 4k. Demonstrating latency/throughput, you > can see > that small files (for me) are a real killer: > ### x -> xx (fastest route) > # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 > count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | > wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n > "$file" | ssh xx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 > }' )"; done > 1 $i ); do echo -n "$file" | ssh $location 'cat > /dev/null'; done ) > |& awk '/^real/{ print $2 }' )"; done > 1 (3984k): 0m4.777s > 10 (398k): 0m10.737s > 100 (39k): 0m53.286s > 1000 (3k): 7m21.493s > > ### xx -> xxx (slowest route) > # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 > count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | > wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n > "$file" | ssh xxx 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 > }' )"; done > 1 (3984k): 0m11.065s > 10 (398k): 0m41.007s > 100 (39k): 4m52.814s > 1000 (3k): 39m23.009s > > ### xx -> xxz (the route I've added and am trying to sync) > # for i in 1 10 100 1000; do file="$( dd if=/dev/urandom bs=1024 > count=$((4000/i)) 2> /dev/null )"; echo "$i ($(( $( echo -n "$file" | > wc -c )/1024 ))k): $( ( time for i in $( seq 1 $i ); do echo -n > "$file" | ssh xxz 'cat > /dev/null'; done ) |& awk '/^real/{ print $2 > }' )"; done > 1 (3984k): 0m2.673s > 10 (398k): 0m16.333s > 100 (39k): 2m0.676s > 1000 (3k): 17m28.265s > > What you're looking at is the cost of transferring a total of 4000k: 1 > transfer > at 4000k, 10 at 400k, 100 at 40k, and 1000 at 4k. With 1 transfer at under 3s > and 1000 > transfers at nearly 17 1/2 minutes for xx -> xxz and for the same total > transfer size, it's really a killer to transfer CHANGELOGs, especially > almost > 200,000 of them. > > And, 92 byte files doesn't improve this: > ### x -> xx (fastest route) > # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; > echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( > seq 1 $i ); do echo -n "$file" | ssh xx 'cat > /dev/null'; done ) |& > awk '/^real/{ print $2 }' )" > 100 (92): 0m34.164s > > ### xx -> xxx (slowest route) > # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; > echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( > seq 1 $i ); do echo -n "$file" | ssh xxx 'cat > /dev/null'; done ) |& > awk '/^real/{ print $2 }' )" > 100 (92): 3m53.388s > > ### xx -> xxz (the route I've added and am trying to sync) > # file="$( dd if=/dev/urandom bs=92 count=1 2> /dev/null )"; i=100; > echo "$i ($(( $( echo -n "$file" | wc -c ) ))): $( ( time for i in $( > seq 1 $i ); do echo -n "$file" | ssh xxz 'cat > /dev/null'; done ) |& > awk '/^real/{ print $2 }' )" > 100 (92): 1m43.389s > > Questions...: > o Why so many CHANGELOGs? > > o Why so slow (in 5 days, I've transferred 43211 CHANGELOGs, so > 43211/5/24/60=6 > implies a real transfer rate of about 6 CHANGELOG files per minute, > which > brings me back to xx's generating new ones at about that rate...)? > > o What can I do to "fix" this? > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users