thr3ads.net - openssh unix dev - Parallel transfers with sftp (call for testing / advice) [Apr 2020]

If this information is useful, please help other people find it:
Share via:

Cyril Servant

2020-Apr-08 15:30 UTC

Parallel transfers with sftp (call for testing / advice)

Hello, I'd like to share with you an evolution I made on sftp.

1. The need

I'm working at CEA (Commissariat ? l'?nergie atomique et aux ?nergies
alternatives) in France. We have a compute cluster complex, and our customers
regularly need to transfer big files from and to the cluster. Each of our front
nodes has an outgoing bandwidth limit (let's say 1Gb/s each, generally more
limited by the CPU than by the network bandwidth), but the total interconnection
to the customer is higher (let's say 10Gb/s). Each front node shares a
distributed file system on an internal high bandwidth network. So the contention
point is the 1Gb/s limit of a connection. If the customer wants to use more than
1Gb/s, he currently uses GridFTP. We want to provide a solution based on ssh to
our customers.

2. The solution

I made some changes in the sftp client. The new option "-n" (defaults
to 0) sets
the number of extra channels. There is one main ssh channel, and n extra
channels. The main ssh channel does everything, except the put and get commands.
Put and get commands are parallelized on the n extra channels. Thanks to this,
when the customer uses "-n 5", he can transfer his files up to 5Gb/s.
There is
no server side change. Everything is made on the client side.

3. Some details

Each extra channel has its own ssh channel, and its own thread. Orders are sent
by the main channel to the threads via a queue. When the user sends a get or put
request, the main channel checks what to do. If the file is small enough, one
simple order is added to the queue. If the file is big, the main channel writes
the last block of the file (in order to create a sparse file), then adds
multiple orders to the queue. Each of these orders are put (or get) of a chunk
of the file. One notable change is the progress meter (in interactive mode).
There is no more one progress meter for each file, now there is only one
progress meter which shows the name of the last dequeued file, and a total of
transferred bytes.

4. Any thoughts ?

You will find the code here:
    https://github.com/cea-hpc/openssh-portable/tree/parallel_sftp
The branch parallel_sftp is based on the tag V_8_2_P1. There may be a lot of
newbie mistakes in the code, I'll gladly take any advice and criticism,
I'm open
minded. And finally, if there is even the slightest chance for these changes to
be merged upstream, please show me the path.

Thank you,
-- 
Cyril

Nico Kadel-Garcia

2020-Apr-08 22:34 UTC

head link

Parallel transfers with sftp (call for testing / advice)

On Wed, Apr 8, 2020 at 11:31 AM Cyril Servant <cyril.servant at gmail.com>
wrote:>
> Hello, I'd like to share with you an evolution I made on sftp.
It *sounds* like you should be using rparallelized rsync over xargs.
Partial sftp or scp transfers are almost inevitable in builk transfers
over a crowded network, and sftp does not have good support for
"mirroring", only for copying content.

See
https://stackoverflow.com/questions/24058544/speed-up-rsync-with-simultaneous-concurrent-file-transfers

> I'm working at CEA (Commissariat ? l'?nergie atomique et aux
?nergies
> alternatives) in France. We have a compute cluster complex, and our
customers
> regularly need to transfer big files from and to the cluster. Each of our
front
> nodes has an outgoing bandwidth limit (let's say 1Gb/s each, generally
more
> limited by the CPU than by the network bandwidth), but the total
interconnection
> to the customer is higher (let's say 10Gb/s). Each front node shares a
> distributed file system on an internal high bandwidth network. So the
contention
> point is the 1Gb/s limit of a connection. If the customer wants to use more
than
> 1Gb/s, he currently uses GridFTP. We want to provide a solution based on
ssh to
> our customers.
>
> 2. The solution
>
> I made some changes in the sftp client. The new option "-n"
(defaults to 0) sets
> the number of extra channels. There is one main ssh channel, and n extra
> channels. The main ssh channel does everything, except the put and get
commands.
> Put and get commands are parallelized on the n extra channels. Thanks to
this,
> when the customer uses "-n 5", he can transfer his files up to
5Gb/s. There is
> no server side change. Everything is made on the client side.
While the option sounds useful for niche cases, I'd be leery of
partial transfers and being compelled to replicate content to handle
partial transfers. rsync has been very good, for years, in completing
partial transfers.
> 3. Some details
>
> Each extra channel has its own ssh channel, and its own thread. Orders are
sent
> by the main channel to the threads via a queue. When the user sends a get
or put
> request, the main channel checks what to do. If the file is small enough,
one
> simple order is added to the queue. If the file is big, the main channel
writes
> the last block of the file (in order to create a sparse file), then adds
> multiple orders to the queue. Each of these orders are put (or get) of a
chunk
> of the file. One notable change is the progress meter (in interactive
mode).
> There is no more one progress meter for each file, now there is only one
> progress meter which shows the name of the last dequeued file, and a total
of
> transferred bytes.
>
> 4. Any thoughts ?
>
> You will find the code here:
>     https://github.com/cea-hpc/openssh-portable/tree/parallel_sftp
> The branch parallel_sftp is based on the tag V_8_2_P1. There may be a lot
of
> newbie mistakes in the code, I'll gladly take any advice and criticism,
I'm open
> minded. And finally, if there is even the slightest chance for these
changes to
> be merged upstream, please show me the path.
>
> Thank you,
> --
> Cyril
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

Cyril Servant

2020-Apr-09 15:01 UTC

head link

Parallel transfers with sftp (call for testing / advice)

> Le 9 avr. 2020 ? 00:34, Nico Kadel-Garcia <nkadel at gmail.com> a
?crit :
> 
> On Wed, Apr 8, 2020 at 11:31 AM Cyril Servant <cyril.servant at
gmail.com> wrote:
>> 
>> Hello, I'd like to share with you an evolution I made on sftp.
> 
> It *sounds* like you should be using rparallelized rsync over xargs.
> Partial sftp or scp transfers are almost inevitable in builk transfers
> over a crowded network, and sftp does not have good support for
> "mirroring", only for copying content.
> 
> See
https://stackoverflow.com/questions/24058544/speed-up-rsync-with-simultaneous-concurrent-file-transfers
This solution is perfect for parallel sending a lot of files. But in the case of
sending one really big file, it does not improve transfer speed.
>> I'm working at CEA (Commissariat ? l'?nergie atomique et aux
?nergies
>> alternatives) in France. We have a compute cluster complex, and our
customers
>> regularly need to transfer big files from and to the cluster. Each of
our front
>> nodes has an outgoing bandwidth limit (let's say 1Gb/s each,
generally more
>> limited by the CPU than by the network bandwidth), but the total
interconnection
>> to the customer is higher (let's say 10Gb/s). Each front node
shares a
>> distributed file system on an internal high bandwidth network. So the
contention
>> point is the 1Gb/s limit of a connection. If the customer wants to use
more than
>> 1Gb/s, he currently uses GridFTP. We want to provide a solution based
on ssh to
>> our customers.
>> 
>> 2. The solution
>> 
>> I made some changes in the sftp client. The new option "-n"
(defaults to 0) sets
>> the number of extra channels. There is one main ssh channel, and n
extra
>> channels. The main ssh channel does everything, except the put and get
commands.
>> Put and get commands are parallelized on the n extra channels. Thanks
to this,
>> when the customer uses "-n 5", he can transfer his files up
to 5Gb/s. There is
>> no server side change. Everything is made on the client side.
> 
> While the option sounds useful for niche cases, I'd be leery of
> partial transfers and being compelled to replicate content to handle
> partial transfers. rsync has been very good, for years, in completing
> partial transfers.
I can fully understand this. In our case, the network is not really crowded, as
customers are generally using research / educational links. Indeed, this is
totally a niche case, but still a need for us. The main use case is putting data
you want to process into the cluster, and when the job is finished, getting the
output of the process. There is rarely the need for synchronising files, except
for the code you want to execute on the cluster, which is considered small
compared to the data. rsync is the obvious choice for synchronising the code,
but not for putting / getting huge amounts of data.

The only other ssh based tool that can speed up the transfer of one big file is
lftp, and it only works for get commands, not for put commands.
>> 3. Some details
>> 
>> Each extra channel has its own ssh channel, and its own thread. Orders
are sent
>> by the main channel to the threads via a queue. When the user sends a
get or put
>> request, the main channel checks what to do. If the file is small
enough, one
>> simple order is added to the queue. If the file is big, the main
channel writes
>> the last block of the file (in order to create a sparse file), then
adds
>> multiple orders to the queue. Each of these orders are put (or get) of
a chunk
>> of the file. One notable change is the progress meter (in interactive
mode).
>> There is no more one progress meter for each file, now there is only
one
>> progress meter which shows the name of the last dequeued file, and a
total of
>> transferred bytes.
>> 
>> 4. Any thoughts ?
>> 
>> You will find the code here:
>>    https://github.com/cea-hpc/openssh-portable/tree/parallel_sftp
>> The branch parallel_sftp is based on the tag V_8_2_P1. There may be a
lot of
>> newbie mistakes in the code, I'll gladly take any advice and
criticism, I'm open
>> minded. And finally, if there is even the slightest chance for these
changes to
>> be merged upstream, please show me the path.
Thank you,
--
Cyril

Darren Tucker

2020-Apr-09 23:55 UTC

head link

Parallel transfers with sftp (call for testing / advice)

On Thu, 9 Apr 2020 at 01:34, Cyril Servant <cyril.servant at gmail.com>
wrote:
[...]> Each of our front
> nodes has an outgoing bandwidth limit (let's say 1Gb/s each, generally
more
> limited by the CPU than by the network bandwidth),
You might also want to experiment with the Ciphers and MACs since
these can make a significant difference in CPU utilization and, if
that's the bottleneck, your throughput.  Which one is best will vary
depending on your hardware, but it's likely to be either AES GCM if
the hardware has AES instructions or chacha20-poly1305 if not.

In the first example below the bottleneck is the source's relatively
elderly 2.66GHz Intel CPU.  In the second it's the gigabit network
between them.

$ scp -c aes256-ctr -o macs=hmac-sha2-512
ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB  63.5MB/s   00:29

$ scp -c chacha20-poly1305 at openssh.com
ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB 112.1MB/s   00:16

-- 
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Cyril Servant

2020-Apr-10 07:39 UTC

head link

Parallel transfers with sftp (call for testing / advice)

> Le 10 avr. 2020 ? 01:55, Darren Tucker <dtucker at dtucker.net> a
?crit :
> 
> On Thu, 9 Apr 2020 at 01:34, Cyril Servant <cyril.servant at
gmail.com> wrote:
> [...]
>> Each of our front
>> nodes has an outgoing bandwidth limit (let's say 1Gb/s each,
generally more
>> limited by the CPU than by the network bandwidth),
> 
> You might also want to experiment with the Ciphers and MACs since
> these can make a significant difference in CPU utilization and, if
> that's the bottleneck, your throughput.  Which one is best will vary
> depending on your hardware, but it's likely to be either AES GCM if
> the hardware has AES instructions or chacha20-poly1305 if not.
> 
> In the first example below the bottleneck is the source's relatively
> elderly 2.66GHz Intel CPU.  In the second it's the gigabit network
> between them.
> 
> $ scp -c aes256-ctr -o macs=hmac-sha2-512
> ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
> ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB  63.5MB/s   00:29
> 
> $ scp -c chacha20-poly1305 at openssh.com
> ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
> ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB 112.1MB/s   00:16
Yes, we already optimised the ciphers and macs used, and indeed recent CPUs
are pretty damn fast with some of them. But parallelising transfers gives us
one more order of magnitude concerning transfer speed.

[me at france openssh-portable]$ sftp germany
Connected to germany.
sftp> get /files/5g 5g.bis
Fetching /files/5g to 5g.bis
/files/5g                                 100% 5120MB 84.5MB/s   01:00
sftp> put 5g /files/5g
Uploading 5g to /files/5g
5g                                        100% 5120MB 81.4MB/s   01:02

[me at france openssh-portable]$ ./sftp -n 10 germany
Connected main channel to germany (1.2.3.104).
Connected channel 1 to germany (1.2.3.102).
Connected channel 2 to germany (1.2.3.105).
Connected channel 3 to germany (1.2.3.103).
Connected channel 4 to germany (1.2.3.96).
Connected channel 5 to germany (1.2.3.98).
Connected channel 6 to germany (1.2.3.101).
Connected channel 7 to germany (1.2.3.97).
Connected channel 8 to germany (1.2.3.100).
Connected channel 9 to germany (1.2.3.99).
Connected channel 10 to germany (1.2.3.104).
sftp> get /files/5g 5g.bis
Fetching /files/5g to 5g.bis
/files/5g                                100% 5120MB 748.8MB/s   00:06
sftp>  put 5g /files/5g
Uploading 5g to /files/5g
5g                                       100% 5120MB 697.1MB/s   00:07

-- 
Cyril

Matthieu Hautreux

2020-May-04 22:41 UTC

head link

Parallel transfers with sftp (call for testing / advice)

Le 10/04/2020 ? 01:55, Darren Tucker a ?crit?:> On Thu, 9 Apr 2020 at 01:34, Cyril Servant <cyril.servant at
gmail.com> wrote:
> [...]
>> Each of our front
>> nodes has an outgoing bandwidth limit (let's say 1Gb/s each,
generally more
>> limited by the CPU than by the network bandwidth),
> You might also want to experiment with the Ciphers and MACs since
> these can make a significant difference in CPU utilization and, if
> that's the bottleneck, your throughput.  Which one is best will vary
> depending on your hardware, but it's likely to be either AES GCM if
> the hardware has AES instructions or chacha20-poly1305 if not.
>
> In the first example below the bottleneck is the source's relatively
> elderly 2.66GHz Intel CPU.  In the second it's the gigabit network
> between them.
>
> $ scp -c aes256-ctr -o macs=hmac-sha2-512
> ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
> ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB  63.5MB/s   00:29
>
> $ scp -c chacha20-poly1305 at openssh.com
> ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
> ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB 112.1MB/s   00:16
>Hi,

As Cyril said, we are aware of the cpubound aspect of the available 
ciphers and MACs in OpenSSH and have already selected the most efficient 
one for our transfers after several benchmarking sessions.

Current hardware processors have a limited core capacity. Core 
frequencies are staying roughly at the same level since many years now 
and only core count are increasing, relying on developpers to play with 
parallelism in order to increase the compute throughput. The future does 
not seem brighter in that area.

In the meantime, network bandwidth has still increased at a regular 
pace. As a result, a cpu frequency that was once sufficient to fill the 
network pipe is now only at a fraction of what the network can really 
deliver. 10GE ethernet cards are common nowadays on datacenter servers 
and no openssh ciphers and MACs can deliver the available bandwidth for 
single transfers.

Introducing parallelism is thus necessary to leverage what the network 
hardware can offer.

The change proposed by Cyril in sftp is a very pragmatic approach to 
deal with parallelism at the file transfer level. It leverages the 
already existing sftp protocol and its capability to write/read file 
content at specified offsets. This enables to speed up sftp transfers 
significantly by parallelizing the SSH channels used for large 
transfers. This improvement is performed only by modifying the sftp 
client, which is a very small modification compared to the openssh 
codebase. The modification is not too complicated to review and validate 
(I did it) and does not change the default behavior of the cli.

It exists tools that offers parallel transfers of large files but we do 
really want to use OpenSSH for that purpose because it is the only 
application that we can really trust (by the way, thank you for making 
that possible). I do no think that we are the only one to think like 
this and I am pretty sure that such a change in the main code base of 
OpenSSH would really help users to use their hardware more efficiently 
in various situations.

Best regards,

Matthieu

hvjunk

2020-May-05 00:24 UTC

head link

Parallel transfers with sftp (call for testing / advice)

Hi Cyril,

Sounds like you?ve reinvented lftp?s ( http://lftp.yar.ru/
<http://lftp.yar.ru/> ) pget and mirror ?parallel options
Okay, it doesn?t seem to have paralel pushing, but there I?ll advise the rsync
parallel methods too
> On 08 Apr. 2020, at 17:30 , Cyril Servant <cyril.servant at
gmail.com> wrote:
> 
> I made some changes in the sftp client. The new option "-n"
(defaults to 0) sets
> the number of extra channels. There is one main ssh channel, and n extra
> channels. The main ssh channel does everything, except the put and get
commands.
> Put and get commands are parallelized on the n extra channels. Thanks to
this,
> when the customer uses "-n 5", he can transfer his files up to
5Gb/s. There is
> no server side change. Everything is made on the client side.

Seemingly Similar Threads

Search for more maybe matching threads

openssh unix dev - Apr 2020 - Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Parallel transfers with sftp (call for testing / advice)

Seemingly Similar Threads