Graham Leggett
2024-May-15 09:54 UTC
rsync whole file transfers extremely slow over SSH - but only in a particular virtual guest
Hi all, I am trying to get to the bottom of a strange rsync performance problem. On a specific guest OS, and only on this guest OS, rsync is giving modem-like transfer speeds. This happens on delta transfers, and whole file transfers. [root at arnie ~]# rsync -avz --progress --sparse arnie.example.com:/home/backup/example/cuttysark.example.com/var-lib-libvirt-images-snapshot/ /home/backup/example/cuttysark.example.com/var-lib-libvirt-images-snapshot/ receiving incremental file list ./ .timestamp 0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=23/25) 3cx.example.com.img 8,589,934,592 100% 771.32kB/s 3:01:15 (xfr#2, to-chk=22/25). <---- s l o w machine.example.img 14,458,254,200 29% 208.50kB/s 45:06:35 ^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(644) [generator=3.1.3] rsync: [generator] write error: Broken pipe (32) <---- s l o w e r [root at arnie images]# rsync --verbose --progress --compress --sparse --copy-devices --partial --whole-file --inplace blackadder.example.com:/dev/dm-8 test.img dm-8 47,825,017 -2147483648% 587.01kB/s ??:??:?? <---- s l o w The above transfer started off for a few seconds at full speed, but then suddenly dropped to treacle slow. First, we check the network underneath rsync. We're tunnelling rsync through ssh, so we test bandwidth through ssh with perf over and ssh tunnel, and we get expected speed for this link. [root at arnie ~]# iperf -c [::1]:5001 -V ------------------------------------------------------------ Client connecting to ::1, TCP port 5001 TCP window size: 2.50 MByte (default) ------------------------------------------------------------ [ 1] local ::1 port 42324 connected with ::1 port 5001 ^C[ ID] Interval Transfer Bandwidth [ 1] 0.00-19.28 sec 11.9 MBytes 5.19 Mbits/sec. <---- fast enough Then we check the disk underneath rsync: [root at arnie images]# dd if=/dev/urandom of=random.img count=1024 bs=10M status=progress 1604321280 bytes (1.6 GB, 1.5 GiB) copied, 16 s, 100 MB/s^C 159+0 records in 159+0 records out 1667235840 bytes (1.7 GB, 1.6 GiB) copied, 16.7261 s, 99.7 MB/s. <---- fast enough Then the RAM under the guest: [root at arnie ~]# free -h total used free shared buff/cache available Mem: 3.6Gi 2.0Gi 394Mi 3.0Mi 1.2Gi 1.3Gi. <---- 1.3GB seems fine? Swap: 923Mi 89Mi 834Mi Then we check rsync running on the hypervisor of this guest: [root at emma images]# rsync --verbose --progress --compress --sparse --copy-devices --partial --whole-file --inplace blackadder.example.com:/dev/dm-8 test.img dm-8 173,316,344 -2147483648% 6.07MB/s ??:??:?? <---- fast enough On an identically sized guest in a different datacenter: [root at arnie images]# rsync --verbose --progress --compress --sparse --copy-devices --partial --whole-file --inplace blackadder.example.com:/dev/dm-8 /home/backup/example/blackadder.example.com/vg001-var_lib_libvirt_images-snapshot.img dm-8 170,053,588 -2147483648% 10.64MB/s ??:??:?? <---- fast enough From what I can see: - The guest OS networking is fine, including bandwidth over ssh. - The hypervisor rsync works fine. - A guest OS in another datacenter works fine. - CPU is not being maxed out, either on the source or target. iowait is not showing anything significant. RAM seems modest. - rsync on the guest is v3.1.3 on Rocky8. I am somewhat stuck. Google is of no help, it's all "it might be this, it might be that", but I've eliminated everything I have found and still the problem remains. Has anyone encountered anything like this before? Regards, Graham --
Paul Slootman
2024-May-15 13:25 UTC
rsync whole file transfers extremely slow over SSH - but only in a particular virtual guest
On Wed 15 May 2024, Graham Leggett via rsync wrote:> > Then we check the disk underneath rsync: > > [root at arnie images]# dd if=/dev/urandom of=random.img count=1024 bs=10M status=progress > 1604321280 bytes (1.6 GB, 1.5 GiB) copied, 16 s, 100 MB/s^C > 159+0 records in > 159+0 records out > 1667235840 bytes (1.7 GB, 1.6 GiB) copied, 16.7261 s, 99.7 MB/s. <---- fast enoughI would try this again with the block size that rsync is using, which will be way less than 10MB. It could be that the VM is limited in the number of IOPS, which is slowing rsync down. If that is the problem, using --whole-file might help as that stops rsync "wasting" IOPS on reading the existing files, and may help in de IO block size. Using 'top' while rsync is running may help to see if rsync is IO bound, the "wa" (wait IO) column will have a large percentage then. You can also run strace to profile rsync to see where most wall clock time is spent: strace -w -c You'll have to do this on each rsync process. Paul
Seemingly Similar Threads
- [LLVMdev] Building Test Suite Still Not Successful
- [LLVMdev] Error While Inserting New Instruction to LLVM IR
- [LLVMdev] "symbol lookup error" while running a Simple Loop Pass
- [LLVMdev] "symbol lookup error" while running a Simple Loop Pass
- [LLVMdev] "symbol lookup error" while running a Simple Loop Pass