Hi, I want to increase the speed of rsync transfers over ssh. 1. The need TL;DR: we need to transfer one huge file quickly (100Gb/s) through ssh. I'm working at CEA (Alternative Energies and Atomic Energy Commission) in France. We have a compute cluster complex, and our customers regularly need to transfer big files from and to the cluster. The bandwidth between the customers and us is generally between 10Gb/s and 100Gb/s. The file system used is LustreFS, and can handle more than 100Gb/s read and write. One security constraint is the use of ssh for every connection. In order to maximize transfers speed, I first tried different Ciphers / MACs. The list of authorized Ciphers / MACs is provided to me by our security team. With these constraints, I can reach 1Gb/s to 3Gb/s. I'm still far from the expected result. This is due to the way encryption/decryption work on modern CPUs: they are really efficient thanks to AES-NI, but are single-threaded. The bandwidth limiter is the speed of a single CPU core. So the next step is: just use parallel or xargs with rsync. And it works like a charm in most cases. But not in compute clusters case. As I said earlier, the files are stored in LustreFS. The good practice for this file system is to create really few files, but big files. And with the way compute clusters work, you generally end with one really big file, often hundreds of Gigabytes, or even Terabytes. 2. What has been done I created parallel-sftp (https://github.com/cea-hpc/openssh-portable/). It is just a fork of openssh's sftp, which creates multiple ssh connections for a single transfer. This way, parallelization is really simple : files are transferred in parallel just like the parallel/xargs solution. And big files are transferred by chunks directly into the destination file (created as a sparse file). One big advantage of this solution is that it doesn't require any server change. All the parallelization is made on the client side. However, there are 2 caveats. There is no consistency check of the copied files. And an interrupted transfer must be restarted from scratch, because there is no way to exactly know which chunks of a big file are transferred. 3. Is rsync the best solution? Now I'm thinking that adding parallelization to rsync is the best solution. We could take advantage of the delta-transfer algorithm in order to just transfer parts of a file. I can imagine a first rsync connection taking care of detecting the diffs between local and distant files, and then forking (or creating threads) for the actual transfers. The development work could be split in two parts : - adding the possibility to transfer part of a file (from byte x to byte y). - adding the possibility to delegate the transfers to other threads / processes. What do you think about this? Does it look feasible? If I develop it, does it have a chance to be merged upstream? I understand it's kind of a niche use case, but I know it's a frequent need in the super-computing world. One important thing to note is that at CEA we have the manpower and will to develop this functionality. We are also open to sponsoring, for development and/or reviews. Thank you, -- Cyril
Robin H. Johnson
2021-Dec-16 21:32 UTC
Parallelizing rsync through multiple ssh connections
On Thu, Nov 04, 2021 at 04:58:03PM +0100, SERVANT Cyril via rsync wrote:> Hi, I want to increase the speed of rsync transfers over ssh.Thanks for your great email here. Having had similar issues in the past in trying to rsync single large files, I wanted to share some of the ideas I'd found to work: HPN-SSH patches. The website is out of date, but don't let that put you off. HPN-SSH can saturate 40Gbit links with tuning (but it's absolutely work to do that tuning). The main things there are the buffer patches, and the multithreaded AES, but you can use the NONE encryption for benchmarking as well. Intel had a paper from 2010 showing the HPN boost (and also other work on multi streams): https://www.intel.com/content/dam/support/us/en/documents/network/sb/fedexcasestudyfinal.pdf Facebook's WARP/WDT tooling: https://github.com/facebookarchive/wdt https://opensourcelibs.com/lib/warp-cli Lastly, I was trying multipath TCP: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/getting-started-with-multipath-tcp_configuring-and-managing-networking I didn't get very far on the MPTCP research angle. I think all of these are likely to be complementary to your work on partitioning the large file. If you have a sample large file and permission to test without encryption, temporarily replacing ssh w/ either the NONE cipher or trying to use buffer-tuned netcat would let you identify what the bottleneck of rsync is in your situation. I found previously that it didn't do a good job on the rsync:// wire protocol over high-latency: it had too many round trips and didn't do much work between them. I think from looking at the rsync code in the past, the checksum system in general is going to be your largest problem. - it assumes that it's checking a single stream for each file - meaningful replacement would be either independent per-segment checksums or something like a merkle tree> 1. The need > > TL;DR: we need to transfer one huge file quickly (100Gb/s) through ssh....> In order to maximize transfers speed, I first tried different Ciphers / MACs. > The list of authorized Ciphers / MACs is provided to me by our security team. > With these constraints, I can reach 1Gb/s to 3Gb/s. I'm still far from the > expected result. This is due to the way encryption/decryption work on modern > CPUs: they are really efficient thanks to AES-NI, but are single-threaded. The > bandwidth limiter is the speed of a single CPU core.HPN-SSH MT-AES here gets you to many cores at the SSH level. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robbat2 at gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 1113 bytes Desc: not available URL: <http://lists.samba.org/pipermail/rsync/attachments/20211216/e65046ae/signature.sig>