On Tue, 6 Apr 2021, rapier wrote:
> On 4/6/21 10:04 PM, Damien Miller wrote:
> > On Tue, 6 Apr 2021, rapier wrote:
> >
> > > Looking at the performance - on my systems sftp seems to be a bit
slower
> > > than scp when dealing with a lot of small files. Not sure why
this is
> > > the case as I haven't looked at the sftp code in years.
> >
> > the OpenSSH sftp client doesn't do inter-file pipelining - it only
> > pipelines read/writes within a transfer, so each new file causes a
> > stall.
> >
> > This is all completely fixable on the client side, and shouldn't
apply
> > to things like sshfs at all.
>
> Gotcha. Is this because of how it sequentially loops through the readdirs
in
> two _dir_internal functions?
Only partly - the client will do SSH2_FXP_READDIR to get the full list of
files and then transfer each file separately. The SSH2_FXP_READDIR are not
pipelined at all, there is no pipelining between obtaining the file list
and the file transfers. Finally each file transfer incurrs a pipeline
stall upon completion.
> If so I'm wondering if you could spawn per file
> threads to get some concurrency within a directory.
I don't think we want a threaded sftp client and AFAIK it isn't
necessary
for the main problem. We could add inter-operation pipelining by adding a
work queue structure and driving the next operation from that. This is
similar to what happens inside do_download()/do_upload() already, but
extended to persist across and between different operations.
> Just curious and this is
> the first time I've looked at the sftp code in years. I hope you
don't mind
> the questions.
Not at all :)
-d