Ron Frederick
2025-Mar-29 05:06 UTC
Support for transferring sparse files via scp/sftp correctly?
Sorry for the mis-send earlier. Here?s the complete message I meant to send. On Mar 6, 2025, at 9:38?PM, Damien Miller <djm at mindrot.org <mailto:djm at mindrot.org>> wrote:> If you want this to happen, I recommend starting by figuring out what > protocol extensions need to be made, and how to support sparse files > on system without SEEK_DATA/HOLE - it should be pretty to do this on > upload without these flags and without extensions.I was inspired by this thread to add sparse file support to AsyncSSH, on OSes that support SEEK_DATA and SEEK_HOLE. It looks like I should also be able to get this to work on Windows with FSCTL_QUERY_ALLOCATED_RANGES and FSCTL_SET_SPARSE, but I haven?t gotten to that yet. As Darren Tucker said, the put() operation here can be made to work with any SFTP server. However, an SFTP extension is required to support this for get() or copy(), or the case where the copy-data extension is used to copy data between files on a remote server without reading and writing it back over the wire. I?ve defined an extension called "ranges at asyncssh.com <mailto:ranges at asyncssh.com>? which is modeled somewhat after FXP_READDIR for getting valid data ranges in a remote file. Each call can return multiple ranges, but on files with a large number of ranges you may need send this request multiple times to get the complete list. This allows for the copying to be interleaved with getting back range responses. The request looks like the following: uint32 id string ?ranges at asyncssh.com <mailto:ranges at asyncssh.com>? string handle uint64 offset uint64 length This requests valid data ranges in the file associated with the request handle. The offset and length specify the portion of the file which the ranges should be returned for. The response looks like: uint32 id uint32 count repeats count times: uint64 offset uint64 length bool end-of-list [optional] The count specifies the number of ranges in the reply. After this is an optional bool which indicates whether there are any more valid data ranges in the request?s offset and length. If there are no entries at all within the request range, an FXP_STATUS of FX_EOF should be sent. If you don?t get all of the requested ranges in a single request, additional requests can be sent starting at just past the end of the last range previously returned. What do you think? -- Ron Frederick ronf at timeheart.net
Darren Tucker
2025-Apr-04 01:02 UTC
Support for transferring sparse files via scp/sftp correctly?
On Sat, 29 Mar 2025 at 16:14, Ron Frederick <ronf at timeheart.net> wrote:> [...] > If you don?t get all of the requested ranges in a single request, > additional requests can be sent starting at just past the end of the last > range previously returned. > > What do you think? >That seems like it'd work well for things with SEEK_HOLE or equivalent, although there's always the chance of the underlying file changing between mapping it out and doing the transfer. Damien pointed out that it's possible to do a reasonable but not perfect sparse file support by memcmp'ing your existing file buffer with a block of zeros and skipping the write if it matches. OpenBSD's cp(1) does this (look for "skipholes"): https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/cp/utils.c?annotate=HEAD. This seems surprisingly effective in the case where you already have the file content in a buffer anyway, but it would be harder to do (or at least more expensive) as part of a separate request type that returns the ranges. It'd be easier to implement if there was some kind of "read-sparse" operation that could return a list of {offset, len, data} instead of just the offsets and lengths. This would reduce the time between the sparse check and the read although it's still potentially racy. -- Darren Tucker (dtucker at dtucker.net) GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.