Graham Leggett
2024-May-15 09:54 UTC
rsync whole file transfers extremely slow over SSH - but only in a particular virtual guest
Hi all,
I am trying to get to the bottom of a strange rsync performance problem.
On a specific guest OS, and only on this guest OS, rsync is giving modem-like
transfer speeds. This happens on delta transfers, and whole file transfers.
[root at arnie ~]# rsync -avz --progress --sparse
arnie.example.com:/home/backup/example/cuttysark.example.com/var-lib-libvirt-images-snapshot/
/home/backup/example/cuttysark.example.com/var-lib-libvirt-images-snapshot/
receiving incremental file list
./
.timestamp
0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=23/25)
3cx.example.com.img
8,589,934,592 100% 771.32kB/s 3:01:15 (xfr#2, to-chk=22/25). <---- s l
o w
machine.example.img
14,458,254,200 29% 208.50kB/s 45:06:35 ^Crsync error: received SIGINT,
SIGTERM, or SIGHUP (code 20) at rsync.c(644) [generator=3.1.3]
rsync: [generator] write error: Broken pipe (32) <---- s l o w e r
[root at arnie images]# rsync --verbose --progress --compress --sparse
--copy-devices --partial --whole-file --inplace blackadder.example.com:/dev/dm-8
test.img
dm-8
47,825,017 -2147483648% 587.01kB/s ??:??:?? <---- s l o w
The above transfer started off for a few seconds at full speed, but then
suddenly dropped to treacle slow.
First, we check the network underneath rsync. We're tunnelling rsync through
ssh, so we test bandwidth through ssh with perf over and ssh tunnel, and we get
expected speed for this link.
[root at arnie ~]# iperf -c [::1]:5001 -V
------------------------------------------------------------
Client connecting to ::1, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[ 1] local ::1 port 42324 connected with ::1 port 5001
^C[ ID] Interval Transfer Bandwidth
[ 1] 0.00-19.28 sec 11.9 MBytes 5.19 Mbits/sec. <---- fast enough
Then we check the disk underneath rsync:
[root at arnie images]# dd if=/dev/urandom of=random.img count=1024 bs=10M
status=progress
1604321280 bytes (1.6 GB, 1.5 GiB) copied, 16 s, 100 MB/s^C
159+0 records in
159+0 records out
1667235840 bytes (1.7 GB, 1.6 GiB) copied, 16.7261 s, 99.7 MB/s. <---- fast
enough
Then the RAM under the guest:
[root at arnie ~]# free -h
total used free shared buff/cache available
Mem: 3.6Gi 2.0Gi 394Mi 3.0Mi 1.2Gi 1.3Gi.
<---- 1.3GB seems fine?
Swap: 923Mi 89Mi 834Mi
Then we check rsync running on the hypervisor of this guest:
[root at emma images]# rsync --verbose --progress --compress --sparse
--copy-devices --partial --whole-file --inplace blackadder.example.com:/dev/dm-8
test.img
dm-8
173,316,344 -2147483648% 6.07MB/s ??:??:?? <---- fast enough
On an identically sized guest in a different datacenter:
[root at arnie images]# rsync --verbose --progress --compress --sparse
--copy-devices --partial --whole-file --inplace blackadder.example.com:/dev/dm-8
/home/backup/example/blackadder.example.com/vg001-var_lib_libvirt_images-snapshot.img
dm-8
170,053,588 -2147483648% 10.64MB/s ??:??:?? <---- fast enough
From what I can see:
- The guest OS networking is fine, including bandwidth over ssh.
- The hypervisor rsync works fine.
- A guest OS in another datacenter works fine.
- CPU is not being maxed out, either on the source or target. iowait is not
showing anything significant. RAM seems modest.
- rsync on the guest is v3.1.3 on Rocky8.
I am somewhat stuck.
Google is of no help, it's all "it might be this, it might be
that", but I've eliminated everything I have found and still the
problem remains.
Has anyone encountered anything like this before?
Regards,
Graham
--
Paul Slootman
2024-May-15 13:25 UTC
rsync whole file transfers extremely slow over SSH - but only in a particular virtual guest
On Wed 15 May 2024, Graham Leggett via rsync wrote:> > Then we check the disk underneath rsync: > > [root at arnie images]# dd if=/dev/urandom of=random.img count=1024 bs=10M status=progress > 1604321280 bytes (1.6 GB, 1.5 GiB) copied, 16 s, 100 MB/s^C > 159+0 records in > 159+0 records out > 1667235840 bytes (1.7 GB, 1.6 GiB) copied, 16.7261 s, 99.7 MB/s. <---- fast enoughI would try this again with the block size that rsync is using, which will be way less than 10MB. It could be that the VM is limited in the number of IOPS, which is slowing rsync down. If that is the problem, using --whole-file might help as that stops rsync "wasting" IOPS on reading the existing files, and may help in de IO block size. Using 'top' while rsync is running may help to see if rsync is IO bound, the "wa" (wait IO) column will have a large percentage then. You can also run strace to profile rsync to see where most wall clock time is spent: strace -w -c You'll have to do this on each rsync process. Paul
Maybe Matching Threads
- [LLVMdev] Building Test Suite Still Not Successful
- [LLVMdev] Error While Inserting New Instruction to LLVM IR
- [LLVMdev] "symbol lookup error" while running a Simple Loop Pass
- [LLVMdev] "symbol lookup error" while running a Simple Loop Pass
- [LLVMdev] "symbol lookup error" while running a Simple Loop Pass