Cyril Servant
2020-May-05 10:55 UTC
Parallel transfers with sftp (call for testing / advice)
Peter Stuge wrote:> > Matthieu Hautreux wrote: >> The change proposed by Cyril in sftp is a very pragmatic approach to >> deal with parallelism at the file transfer level. It leverages the >> already existing sftp protocol and its capability to write/read file >> content at specified offsets. This enables to speed up sftp transfers >> significantly by parallelizing the SSH channels used for large >> transfers. This improvement is performed only by modifying the sftp >> client, which is a very small modification compared to the openssh >> codebase. The modification is not too complicated to review and validate >> (I did it) and does not change the default behavior of the cli. > > I think you make a compelling argument. I admit that I haven't > reviewed the patch, even though that is what matters the most.If you want to review the code, here is a direct link to the patch: https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1> I guess that noone really minds ways to make SFTP scale, but ever since > the patch was proposed I have been thinking that the paralell channel > approach is likely to introduce a whole load of not very clean error > conditions regarding reassembly, which need to be handled sensibly both > within the sftp client and on the interface to outside/calling processes. > Can you or Cyril say something about this?Indeed, reassembly must be handled properly. Depending on the filesystem you're writing to, the block size can vary. Having a "power of 2 bytes" chunk is the starting point. Performance-wise, our storage experts asked us to use 2MB or more chunks. I chose the arbitrary value of 64MB chunks because I don't see any advantage of using smaller chunks. After all, we're talking about using parallel transfers on high bandwidth links. Of course this value can be discussed and easily changed (it's the base_chunk_size variable). In order to be more exact, a chunk size will be a multiple of base_chunk_size, so a 4GB file transferred with 4 channels will be cut in 4 1GB chunks. The main source of errors during reassembly is if the copy_buffer_len (-B option) is set to a "non power of 2" value. This will lead to writes sitting partially on 2 blocks, and probably corrupt the file. Writing simultaneously the start and the end of a block on 2 different NFS clients is a really bad idea. That's why I issue a warning if -n > 0 and -B is not a power of 2. Concerning error handling within threads, if a thread encounters a blocking error, the other threads will end their current chunk copy, then stop doing anything.> And another thought - if the proposed patch and/or method indeed will not > go anywhere, would it still be helpful for you if the sftp client would > only expose the file offset functionality? That way, the complexity of > reassembly and the associated error handling doesn't enter into OpenSSH.There is no server side change in this patch, so I don't think we can talk about conplexity of reassembly. Once base_chunk_size is set and a warning or an error is raised if copy_buffer_len is not a power of 2, there is nothing more than during a reput or a reget. Of course, exposing the file offset functionality would help creating a new software on top of the sftp client, but at the cost of simplicity. And if you do this, in the case of "get", the new software will need to send a lot of "ls" commands in order to be aware of the directory structure, the file sizes in order to correctly split transfers? I feel like it's reinventing the wheel. -- Cyril
Ron Frederick
2020-May-05 14:42 UTC
Parallel transfers with sftp (call for testing / advice)
On May 5, 2020, at 3:55 AM, Cyril Servant <cyril.servant at gmail.com> wrote:> Peter Stuge wrote: >> >> Matthieu Hautreux wrote: >>> The change proposed by Cyril in sftp is a very pragmatic approach to >>> deal with parallelism at the file transfer level. It leverages the >>> already existing sftp protocol and its capability to write/read file >>> content at specified offsets. This enables to speed up sftp transfers >>> significantly by parallelizing the SSH channels used for large >>> transfers. This improvement is performed only by modifying the sftp >>> client, which is a very small modification compared to the openssh >>> codebase. The modification is not too complicated to review and validate >>> (I did it) and does not change the default behavior of the cli. >> >> I think you make a compelling argument. I admit that I haven't >> reviewed the patch, even though that is what matters the most. > > If you want to review the code, here is a direct link to the patch: > https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1 <https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1>I haven?t reviewed the patch in detail, but one thing that jumped out at me when looking at this was your choice to use multiple channels to implement the parallelism. The SFTP protocol already allows multiple outstanding reads or writes on a single channel, without waiting for the server?s response. I don?t know if the current OpenSSH client takes advantage of that or not, but it seems to me that you?d get the same benefit from that as you would from opening multiple channels with less setup cost (not having to open the additional channels, and not having to do separate file open calls on each channel). It might also make some other things like the progress bar support easier, as all of the responses would be coming back on the same channel. You can find an example of this form of parallelism (multiple read/write requests outstanding on a single channel) in AsyncSSH?s SFTP implementation. By default, it will allow up to 128 parallel reads or writes of 16 KB each, and you can adjust both the block size and max outstanding requests via ?block_size? and ?max_requests? arguments in the calls to methods such as get(), put(), and copy() in the asyncssh.SFTPClient class (https://asyncssh.readthedocs.io/en/latest/api.html#sftpclient <https://asyncssh.readthedocs.io/en/latest/api.html#sftpclient>). Here?s an example client: import asyncio, asyncssh, sys async def run_client(): async with asyncssh.connect('localhost') as conn: async with conn.start_sftp_client() as sftp: await sftp.get('example.txt') try: asyncio.get_event_loop().run_until_complete(run_client()) except (OSError, asyncssh.Error) as exc: sys.exit('SFTP operation failed: ' + str(exc)) The block_size and max_requests can be added as optional arguments to the sftp.get() in this example, but parallelism will automatically be used for any file larger than 16 KB, allowing for up to 2 MB of data to be outstanding at once (128 requests of 16 KB each), without any threading and all over a single SFTP channel. This parallel I/O support in AsyncSSH also extends to clients making large read() or write() calls on a remote file. If you call read() or write() with a length of more than the block_size set when a file is opened, these reads or write will automatically be broken into multiple parallel requests. This behavior can be disabled by setting block_size to 0 if you really need to control the exact requests sent on the channel, but by default callers will get the benefit of the extra speed. -- Ron Frederick ronf at timeheart.net
Ben Lindstrom
2020-May-05 15:04 UTC
Parallel transfers with sftp (call for testing / advice)
OpenSSH does: ???? -R num_requests ???????????? Specify how many requests may be outstanding at any one time. ???????????? Increasing this may slightly improve file transfer speed but will ???????????? increase memory usage.? The default is 64 outstanding requests. As well as block size: ???? -B buffer_size ???????????? Specify the size of the buffer that sftp uses when transferring ???????????? files.? Larger buffers require fewer round trips at the cost of ???????????? higher memory consumption.? The default is 32768 bytes. I suspect for this to be taken seriously it will more than likely need to have threads dropped in favor of a different method.? As multiple times the OpenSSH team has rejected threads as they cause additional complexity and security issues. Ben Ron Frederick wrote on 5/5/20 9:42 AM:> On May 5, 2020, at 3:55 AM, Cyril Servant <cyril.servant at gmail.com> wrote: >> Peter Stuge wrote: >>> Matthieu Hautreux wrote: >>>> The change proposed by Cyril in sftp is a very pragmatic approach to >>>> deal with parallelism at the file transfer level. It leverages the >>>> already existing sftp protocol and its capability to write/read file >>>> content at specified offsets. This enables to speed up sftp transfers >>>> significantly by parallelizing the SSH channels used for large >>>> transfers. This improvement is performed only by modifying the sftp >>>> client, which is a very small modification compared to the openssh >>>> codebase. The modification is not too complicated to review and validate >>>> (I did it) and does not change the default behavior of the cli. >>> I think you make a compelling argument. I admit that I haven't >>> reviewed the patch, even though that is what matters the most. >> If you want to review the code, here is a direct link to the patch: >> https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1 <https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1> > I haven?t reviewed the patch in detail, but one thing that jumped out at me when looking at this was your choice to use multiple channels to implement the parallelism. The SFTP protocol already allows multiple outstanding reads or writes on a single channel, without waiting for the server?s response. I don?t know if the current OpenSSH client takes advantage of that or not, but it seems to me that you?d get the same benefit from that as you would from opening multiple channels with less setup cost (not having to open the additional channels, and not having to do separate file open calls on each channel). It might also make some other things like the progress bar support easier, as all of the responses would be coming back on the same channel. > > You can find an example of this form of parallelism (multiple read/write requests outstanding on a single channel) in AsyncSSH?s SFTP implementation. By default, it will allow up to 128 parallel reads or writes of 16 KB each, and you can adjust both the block size and max outstanding requests via ?block_size? and ?max_requests? arguments in the calls to methods such as get(), put(), and copy() in the asyncssh.SFTPClient class (https://asyncssh.readthedocs.io/en/latest/api.html#sftpclient <https://asyncssh.readthedocs.io/en/latest/api.html#sftpclient>). Here?s an example client: > > import asyncio, asyncssh, sys > > async def run_client(): > async with asyncssh.connect('localhost') as conn: > async with conn.start_sftp_client() as sftp: > await sftp.get('example.txt') > > try: > asyncio.get_event_loop().run_until_complete(run_client()) > except (OSError, asyncssh.Error) as exc: > sys.exit('SFTP operation failed: ' + str(exc)) > > > The block_size and max_requests can be added as optional arguments to the sftp.get() in this example, but parallelism will automatically be used for any file larger than 16 KB, allowing for up to 2 MB of data to be outstanding at once (128 requests of 16 KB each), without any threading and all over a single SFTP channel. > > This parallel I/O support in AsyncSSH also extends to clients making large read() or write() calls on a remote file. If you call read() or write() with a length of more than the block_size set when a file is opened, these reads or write will automatically be broken into multiple parallel requests. This behavior can be disabled by setting block_size to 0 if you really need to control the exact requests sent on the channel, but by default callers will get the benefit of the extra speed.
Cyril Servant
2020-May-05 16:03 UTC
Parallel transfers with sftp (call for testing / advice)
Ron Frederick wrote:> On May 5, 2020, at 8:36 AM, Cyril Servant <cyril.servant at gmail.com> wrote: >> Ron Frederick wrote: >>> I haven?t reviewed the patch in detail, but one thing that jumped out at me when looking at this was your choice to use multiple channels to implement the parallelism. The SFTP protocol already allows multiple outstanding reads or writes on a single channel, without waiting for the server?s response. I don?t know if the current OpenSSH client takes advantage of that or not, but it seems to me that you?d get the same benefit from that as you would from opening multiple channels with less setup cost (not having to open the additional channels, and not having to do separate file open calls on each channel). It might also make some other things like the progress bar support easier, as all of the responses would be coming back on the same channel. >> >> As Matthieu said earlier, each ssh channel speed is bound to a CPU core speed. >> The whole point here is to use multiple ssh channels in order to increase >> transfer speed. And as Ben just said, there is already the -R option in sftp. >> This option indeed helps, but the channel speed is still bound to a CPU core >> speed. > > > Thanks. It looks like OpenSSH?s SFTP ?-B? and ?-R? are indeed similar to what I described. > > When you say ?SSH channel? here, though, do you actually mean multiple data streams sent over a single TCP connection, or multiple independent TCP connections? I haven?t tried to measure this, but I would have expected most of the cost of OpenSSH processing to be the decryption, which happens at the connection level, not any processing at the channel level. So, I?m a bit surprised that you?d be able to get much benefit from multiple cores when running multiple channels on a single TCP connection.I say SSH channel, but I should talk about SSH connection. And as these are different TCP connections, it allows simultaneous connection to multiple servers. Of course, this is only useful if these servers share the same network storage (NFS, luste?). -- Cyril
David Newall
2020-May-06 04:21 UTC
Parallel transfers with sftp (call for testing / advice)
Did anything happen after https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers-fast/? I suspect it did, because we do now allow multiple outstanding packets, as well as specifying the buffer size. Daniel explained the process that SFTP uses quite clearly, such that I'm not sure why re-assembly is an issue.? He explained that each transfer already specifies the offset within the file.? It seems reasonable that multiple writers would just each write to the same file at their various different offsets.? It relies on the target supporting sparse files, but supercomputers only ever run Linux ;-) which does do the right thing. The original patch which we are discussing seemed more concerned about being able to connect to multiple IP addresses, rather than multiple connections between the same pair of machines.? The issue, as I understand, is that the supercomputer has slow NICs, so adding multiple NICs allows greater network bandwidth.? This, I think, is the problem to be solved; not re-assembly, just sending to what appear to be multiple different hosts (i.e. IP addresses.) I was curious to know why a supercomputer would have issues receiving at some high-bandwidth via a single NIC, while the sending machine has no such performance issue; but that's an aside.
Matthieu Hautreux
2020-May-08 23:25 UTC
Parallel transfers with sftp (call for testing / advice)
Le 06/05/2020 ? 06:21, David Newall a ?crit?:> Did anything happen after > https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers-fast/? I > suspect it did, because we do now allow multiple outstanding packets, > as well as specifying the buffer size. > > Daniel explained the process that SFTP uses quite clearly, such that > I'm not sure why re-assembly is an issue.? He explained that each > transfer already specifies the offset within the file.? It seems > reasonable that multiple writers would just each write to the same > file at their various different offsets.? It relies on the target > supporting sparse files, but supercomputers only ever run Linux ;-) > which does do the right thing.You are right, reassembly is not an issue, as long as you have sparse files support, which is our case with Linux :)> > The original patch which we are discussing seemed more concerned about > being able to connect to multiple IP addresses, rather than multiple > connections between the same pair of machines.? The issue, as I > understand, is that the supercomputer has slow NICs, so adding > multiple NICs allows greater network bandwidth.? This, I think, is the > problem to be solved; not re-assembly, just sending to what appear to > be multiple different hosts (i.e. IP addresses.)No, the primary goal of the patch is to enable to do that between two endpoints with one NIC per endpoint, the NIC being 10GE or faster. Here is an example with roughly the same results for a single destination/IP : # With the patched sftp and 1+10 parallel SSH connections [me at france openssh-portable]$ ./sftp -n 10 germany0 Connected main channel to germany0 (1.2.3.96). Connected channel 1 to germany0 (1.2.3.96). Connected channel 2 to germany0 (1.2.3.96). Connected channel 3 to germany0 (1.2.3.96). Connected channel 4 to germany0 (1.2.3.96). Connected channel 5 to germany0 (1.2.3.96). Connected channel 6 to germany0 (1.2.3.96). Connected channel 7 to germany0 (1.2.3.96). Connected channel 8 to germany0 (1.2.3.96). Connected channel 9 to germany0 (1.2.3.96). Connected channel 10 to germany0 (1.2.3.96). sftp>? get 5g 5g.bis Fetching /files/5g to 5g.bis /files/5g 100% 5120MB 706.7MB/s?? 00:07 sftp> put 5g.bis Uploading 5g.bis to /files/5g.bis 5g.bis 100% 5120MB 664.0MB/s?? 00:07 sftp> # WIth the legacy sftp : [me at france openssh-portable]$ sftp germany0 sftp> get 5g 5g.bis Fetching /files/5g to 5g.bis /p/scratch/chpsadm/files/5g 100% 5120MB? 82.8MB/s?? 01:01 sftp> put 5g.bis Uploading 5g.bis to /files/5g.bis 5g.bis 100% 5120MB? 67.0MB/s?? 01:16 sftp> # With scp : [me at france openssh-portable]$ scp 5g germany0:/files/5g.bis 5g 100% 5120MB? 83.1MB/s?? 01:01 #With rsync : [me at france openssh-portable]$ rsync -v 5g germany0:/files/5g.bis 5g sent 5,370,019,908 bytes? received 35 bytes? 85,920,319.09 bytes/sec total size is 5,368,709,120? speedup is 1.00> > I was curious to know why a supercomputer would have issues receiving > at some high-bandwidth via a single NIC, while the sending machine has > no such performance issue; but that's an aside.Supercomputers commonly offer multiple "login nodes" and a generic DNS entry to connect to one of them randomly : the DNS entry is associated to multiple IP adresses and the client (dns resolver) selects one of them. Other DNS entries may exist to address a particular login node, in case you want to go at a particular place. When used with Cyril 's patched sftp, this logic makes that you are targeting multiple hosts automatically if you use the generic DNS entry (the first perf results of Cyril). If you select a particular host DNS entry (like in this exampe), then you will only contact that single host only. On supercomputers, files are commonly stored on distributed file systems like NFS, Lustre, GPFS, ... In case your transfers target one of those types of file systems, you can use multiple hosts as destinations without any issues. You just need to ensure that the sftp sent/written blocks are properly sized to avoid any overwritting of some targets by others because of their file systems client implementations and the asynchronism of the various page cache flushes on the involved nodes. That is what is done in the patch, as explained by Cyril in a previous message, the block size used for parallel transfers was selected with that potential issue in mind. Regards, Matthieu> > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Peter Stuge
2020-May-09 14:57 UTC
Parallel transfers with sftp (call for testing / advice)
Hi, Cyril Servant wrote:> > I think you make a compelling argument. I admit that I haven't > > reviewed the patch, even though that is what matters the most. > > If you want to review the code, here is a direct link to the patch: > https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1Thanks for that, I've pulled your branch and reviewed locally. I'm sorry, but I have to say that I think this has zero chance of going upstream as-is, and it will probably be problematic even with further work. I'll come back to why. As-is there are way too many things going on in this proposal at once, and it seems to make some assumptions which do hold true on Linux and probably other systems but are not as portable as the rest of the codebase, and thus can't work on some systems where OpenSSH currently works. My review feedback follows. Note that I'm not an OpenSSH developer, but have followed the project for a long time. Some feedback is specific, but most is rather general. Your change viewed in isolation is not terrible, but I hope to create an understanding for why the proposal is far too different from the existing code to be able to go upstream, so that you may be able to avoid similar situations in the future. * A new thread queue infrastructure Please forget about using this pattern in the OpenSSH context. Others have mentioned that threading requires much more care, and while that is true I think a simple thread queue is doable, although I think there's an issue or two still overlooked in the proposed implementation (specifically, threads combined with child processes especially requires carefully dealing with signals in all threads, which the change doesn't seem to do, and the thread_loop() logic is not clear - the function will never return). Anyway, the important point is that a thread queue is utterly different from the architecture of the entire OpenSSH codebase, it is not nearly as portable, and introducing such a different paradigm can not be justified for gaining concurrency in the range of 10-63. There already exists concurrency infrastructure which will scale fine into that range, it is based on a select() call. If you do rework, please look at if/how you can reuse that and do discuss what you find, in particular discuss beforehand if you find that you would need to extend it. * Established behavior is changed, an error condition arbitrarily introduced The proposed change refuses file transfer if the local and remote files have different sizes. This has never been a requirement before, and should certainly not become one. It's also not neccessary for what you want to do. If a user says to transfer a file then the behavior always was and should continue to be that whatever was there before on the recipient side is simply overwritten. * New flags are added to existing functions and new versions of some functions If you ever find yourself creating a function called "real_$otherfunction" you can be sure that there exists another, better approach. The new flags and parameters are introduced on very many layers throughout the code, in order to propagate higher level state down to the lowest level. Sometimes that is unavoidable, but really do look hard for a simpler approach first. You also introduce new global state. Maybe it would be possible to combine the two. Ideally you'll find a way to change your code such that you do not actually need to pass any new state so far down in a call stack. Try hard to confine any changes within as narrow a call stack window as possible, ideally only a single caller+callee level for each change. Does that make sense? The existing concurrency infrastructure already solves this problem btw. * Special casing is added for the new functionality In a couple of places new code is added if the new functionality is used, and the previous code is kept verbatim if not. This approach is untenable in the long run, and indicates that the existing code would need to change to also support the new functionality, or sometimes even better: that the new code needs to go somewhere else, in to another layer in the call stack, so that existing code and logic doesn't need to change. * Don't add server workarounds to the client fake_opendir(), reverse_recurse_stat() and wait_availability() do not seem to belong in the client. The comment about fake_opendir() says: + * Open remote directory in order to refresh the file system cache. This is + * useful in the case of a parallel upload to a distributed file system (such as + * NFS or Lustre) To me this needs to go into the server; it is the server's responsibility to present fresh and correct information about the files it has. I don't believe that this was done for that purpose, but it does look a bit like this workaround was put in the client to be able to claim that the server component doesn't need to change, when apparently it does. :\ Personally I think this is rather pragmatism, and sparing yourself the wait for a changed server to be in operation on all your relevant nodes. * Ad-hoc name resolution and random server address choice The sftp client is changed to always call getaddrinfo() on the server name and pick a random address out of the result. This was not the case before, otherwise the change would not have been necessary. Server name resolution is another thing that is already done consistently in a particular way within the codebase, but the proposed change ignores that and adds a one-off thing to one specific code path. That is also untenable for the codebase in the long run; a different approach is absolutely neccessary. * The thread queue uses "orders" while the existing codebase uses callbacks This is again about reusing existing patterns in a codebase. Introducing a completely new and different pattern is sometimes unavoidable, but then it may make more sense to first refactor the existing codebase, to enable the addition of some new functionality which would be made possible only by the new pattern. I don't think this is the case here, however. * Memory allocations without explanations and without obvious frees In a few places it looks like new memory allocations are added without corresponding frees, and there is no explanation e.g. in the commit message. That makes me worry that leaks are introduced, which would be unfortunate, since there are probably some folks with long-running client processes. And the commit that claims to correct memory leaks is also not explained and in fact looks like it could actually create another memory leak. * Tests. Nice! :) * The "optimized sparse file creation" is not obvious and not explained. * scp ends up being affected by what is proposed as an sftp change. This is another indicator that something may be changing in places where it shouldn't. * Form. This is the least significant problem, but it does still matter. There is extra whitespace here and there, and there are some unneccessary casts, e.g. in free() calls. Please don't understand the above points as some kind of checklist of things to fix, as I said that's not my mandate, I just want to contribute perspective because I think that your use case is valid, and I appreciate that you want to contribute to OpenSSH. Now why this solution may be problematic even with further work: Given that at least part of the significant mismatch between proposed code and existing code is due to the nature of the problem I think it may be a bit tricky to find a solution contained completely within OpenSSH, at least right away. With that said, I still think that it makes sense to somehow allow OpenSSH to work in CPU-bound situations such as yours. The question is how..> > likely to introduce a whole load of not very clean error conditions > > regarding reassembly, which need to be handled sensibly both within > > the sftp client and on the interface to outside/calling processes. > > Can you or Cyril say something about this? > > Indeed, reassembly must be handled properly... I only understood in later emails that you actually don't handle reassembly within OpenSSH, but offload that onto the kernel by using sparse files. I think that's a wise choice. But it obviously only works given kernel and filesystem support, I don't know if it's OK to require that. And what happens without that support? Another approach would be for the recipient to initially create the full size file before any data transfer, but that comes at the cost of a probably quite significant delay. (Maybe not though? Can you test?) More importantly, this is significantly different from the current behavior; a file transfer thus far creates or truncates the file on the recipient side and appends received data, making the file grow, allowing e.g. humans to know if the transfer was complete by only comparing sizes. And there we've come full circle, arriving at transfering only part of a file with sftp, the reassembly problem, and the topic of other tooling already reliably handling things such as partial transfers, reassembly, and more...> > And another thought - if the proposed patch and/or method indeed will not > > go anywhere, would it still be helpful for you if the sftp client would > > only expose the file offset functionality? That way, the complexity of > > reassembly and the associated error handling doesn't enter into OpenSSH. > > There is no server side change in this patch, so I don't think we can talk > about conplexity of reassembly.Indeed the complexity I had in mind isn't present in your proposal, but OTOH there is plenty of other complexity that I didn't anticipate. Heh. And while there is no server change, I think there probably should be, instead of adding that fake_opendir() business to the client.> Of course, exposing the file offset functionality would help creating a new > software on top of the sftp client, but at the cost of simplicity.Your proposed change is also all but simple. I think an approach that requires only a very small change has a much better chance to get upstream, but it obviously also depends on what the patch looks like. I would encourage you to give it a try however. I guess it should not need to be a very large patch.> And if you do this, in the case of "get", the new software will need to > send a lot of "ls" commands in order to be aware of the directory structure, > the file sizes in order to correctly split transfersRight, recursive get could probably benefit from further support from the sftp client, but start simple; what would the simple case look like, where a user instructs the sftp program to transfer only a part of a file? Thanks and kind regards //Peter
Cyril Servant
2020-May-18 08:14 UTC
Parallel transfers with sftp (call for testing / advice)
Hi Peter, and thank you for your advice, it's really appreciated.> * A new thread queue infrastructure > > Please forget about using this pattern in the OpenSSH context. Others have > mentioned that threading requires much more care, and while that is true I > think a simple thread queue is doable, although I think there's an > issue or two still overlooked in the proposed implementation (specifically, > threads combined with child processes especially requires carefully dealing > with signals in all threads, which the change doesn't seem to do, and the > thread_loop() logic is not clear - the function will never return).The thread exists in thread_real_loop(), but yeah, it may not be clear enough. Anyways, if the we have to avoid using threads, we'll have to rework all this?> * Established behavior is changed, an error condition arbitrarily introduced > > The proposed change refuses file transfer if the local and remote files > have different sizes. This has never been a requirement before, and should > certainly not become one. It's also not neccessary for what you want to do. > If a user says to transfer a file then the behavior always was and should > continue to be that whatever was there before on the recipient side is simply > overwritten.No, this error is only raised if the thread wants to write a part of the file, just after the main thread created the sparse file. If the file doesn't have the right size, this means there has been a problem during the sparse file creation.> * Don't add server workarounds to the client > > [...] > > * Ad-hoc name resolution and random server address choiceThose 2 functionalities have been added with our specific HPC clusters in mind. I think we can simply remove them, and just focus on a point to point operation. For information, the unpatched sftp never resolves hostnames, it just lets the underlying ssh process do it.> Another approach would be for the recipient to initially create the full > size file before any data transfer, but that comes at the cost of a probably > quite significant delay. (Maybe not though? Can you test?)Well, without changing anything server-side, the only solution is to write something at the end of the file. In our tests it creates a sparse file, but indeed, this must be portable, and we have to test it on multiple platforms? Once again, thanks for your advice. I've only answered a few things here, but as you said, portability is the main subject, then if an alternative to threads has to be chosen, we'll have to think about it. -- Cyril