Brian Camp
2015-Jun-29 13:46 UTC
[Gluster-users] Geo-replicated disk images losing sparseness
Hi, I have a Libvirt/KVM environment that uses three Gluster servers for disk storage. Two of the servers house the replicated volume with the VM disk images and the third is offsite and used for geo-replication. All of the hardware is the same - fairly high end (RAID10, SAS, 2x Xeons), Centos 7, XFS and Gluster 3.6.2 from the Centos Storage Sig. The VM images themselves are about 1.5TB of sparse files, that take up around 430GB on disk. After things were set up, I noticed that Geo-replication was taking much longer than expected, even though the amount of on disk changes was small and the link was 100 meg. The primary bottleneck seems to be rsync on both ends of the geo-replication, taking up CPU as it checksums the disk images. Several optimizations helped, such as setting a high sync_jobs and changing rsync's compression-level, but not enough for the geo-replication to keep up. The disk images are losing sparseness when geo-replicated. On both replicas, the disk images take up 430GB, but on the geo-replicated one they take up the full 1.5TB. I've tried several different configs on the servers and blown away/restarted the replication several times, but they always turn out to be 1.5TB. This results in the full 1.5TB being read in and checksumed with each sync, which is very slow. Worse, the two sides of the sync seem to take turns - rsync will chew away on one of the replicas for a while with the geo-replication server idle and then vice versa. I notice that rsync is being called with the sparse flag (-S) and --inplace. The rsync manual says that the two are incompatible under the --sparse section, but doesn't specify the behavior when they are called together. I'm not sure if the sparse issue is a bug or if I've missed something in the configuration? Thanks -Brian