Damien Miller
2025-Mar-07 05:38 UTC
Support for transferring sparse files via scp/sftp correctly?
On Wed, 5 Mar 2025, Cedric Blancher wrote:> On Tue, 4 Mar 2025 at 21:22, Chris Rapier <rapier at psc.edu> wrote: > > > > > > > > On 3/4/25 05:34, Philipp Marek via openssh-unix-dev wrote: > > >> Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are > > >> holes skipped and not transferred as chunks of 0 bytes? [1] > > >> > > >> We're asking about sparse files in the >= 1PB range, which consists of > > >> multi-TB holes with around 600-2000GB of valid data. > > > > > > > > > Perhaps rsync would be a good fit here, > > > it supports --sparse. > > > _______________________________________________ > > > openssh-unix-dev mailing list > > > openssh-unix-dev at mindrot.org > > > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev > > > > I think one of the issues you are going to face is that SEEK_DATA and > > SEEK_HOLE don't seem to be currently supported under OpenBSD. Since > > that's the home OS for OpenSSH this could create portability issues. > > While you can get around that with the judicious use of defines it means > > that the feature set will start to shift between different OSes. > > OpenBSD unfortunately does not implement so many other APIs. But other > OS do implement SEEK_DATA+SEEK_HOLE, including FreeBSD, Linux, > Solaris, Illumos and even Cygwin. Even NFS has a SEEK to lookup holes > and data sections in files. > SEEK_HOLE+SEEK_DATA are also now part of the POSIX standard, so IMO it > is time to face the bug that sparse files are not handled correctly > and fix itYou and the others on this thread are IIRC the first people in sftp's 24 year history to ever ask for sparse file support. Its absence is not a bug and adding it will almost certainly require new protocol extensions. Being pushy with vounteer developers, telling us what our priorities should be, assigning us work, etc. will not have the result you want. If you want this to happen, I recommend starting by figuring out what protocol extensions need to be made, and how to support sparse files on system without SEEK_DATA/HOLE - it should be pretty to do this on upload without these flags and without extensions. -d
Lionel Cons
2025-Mar-07 10:49 UTC
Support for transferring sparse files via scp/sftp correctly?
On Fri, 7 Mar 2025 at 06:45, Damien Miller <djm at mindrot.org> wrote:> > On Wed, 5 Mar 2025, Cedric Blancher wrote: > > > On Tue, 4 Mar 2025 at 21:22, Chris Rapier <rapier at psc.edu> wrote: > > > > > > > > > > > > On 3/4/25 05:34, Philipp Marek via openssh-unix-dev wrote: > > > >> Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are > > > >> holes skipped and not transferred as chunks of 0 bytes? [1] > > > >> > > > >> We're asking about sparse files in the >= 1PB range, which consists of > > > >> multi-TB holes with around 600-2000GB of valid data. > > > > > > > > > > > > Perhaps rsync would be a good fit here, > > > > it supports --sparse. > > > > _______________________________________________ > > > > openssh-unix-dev mailing list > > > > openssh-unix-dev at mindrot.org > > > > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev > > > > > > I think one of the issues you are going to face is that SEEK_DATA and > > > SEEK_HOLE don't seem to be currently supported under OpenBSD. Since > > > that's the home OS for OpenSSH this could create portability issues. > > > While you can get around that with the judicious use of defines it means > > > that the feature set will start to shift between different OSes. > > > > OpenBSD unfortunately does not implement so many other APIs. But other > > OS do implement SEEK_DATA+SEEK_HOLE, including FreeBSD, Linux, > > Solaris, Illumos and even Cygwin. Even NFS has a SEEK to lookup holes > > and data sections in files. > > SEEK_HOLE+SEEK_DATA are also now part of the POSIX standard, so IMO it > > is time to face the bug that sparse files are not handled correctly > > and fix it > > You and the others on this thread are IIRC the first people in sftp's > 24 year history to ever ask for sparse file support.This is actually not true. ssh.com's ssh had a patch for sparse file support, made by the same people at SUN who did the "X11 untrusted cookie" (ssh -Y) work (Alan Coopersmith, Roland Mainz). But it never made it past a patch, because it was Solaris-specfic, and each filesystem vendor on other platforms had their own custom APIs to lookup data ranges in sparse files. Worse even, some APIs were slow, because they enumerated ALL hole and data ranges, which is a no-go for multi-petabyte files these days (SEEK_HOLE works fine, and so does FSCTL_QUERY_ALLOCATED_RANGES on Windows) This was 22 years ago. Usage of the patched ssh.com ssh fell out of use around 8 years ago, and since then we have a cumbersome and fragile workaround with tar files as containers for sftp transfers in place. That thing never really works reliably, and shows up on the management radar as "IT outage" too often. 22 years later (from the release of the original ssh.com sparse file patch), we have SEEK_DATA+SEEK_HOLD as established POSIX standard (https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html), which is supported on Linux, Solaris, FreeBSD and even Cygwin. So in my humble opinion it'll be nice to work on sparse file support in OpenSSH. Lionel
Chris Rapier
2025-Mar-07 16:04 UTC
Support for transferring sparse files via scp/sftp correctly?
On 3/7/25 00:38, Damien Miller wrote:> On Wed, 5 Mar 2025, Cedric Blancher wrote: > >> On Tue, 4 Mar 2025 at 21:22, Chris Rapier <rapier at psc.edu> wrote: >>> >>> >>> >>> On 3/4/25 05:34, Philipp Marek via openssh-unix-dev wrote: >>>>> Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are >>>>> holes skipped and not transferred as chunks of 0 bytes? [1] >>>>> >>>>> We're asking about sparse files in the >= 1PB range, which consists of >>>>> multi-TB holes with around 600-2000GB of valid data. >>>> >>>> >>>> Perhaps rsync would be a good fit here, >>>> it supports --sparse. >>>> _______________________________________________ >>>> openssh-unix-dev mailing list >>>> openssh-unix-dev at mindrot.org >>>> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev >>> >>> I think one of the issues you are going to face is that SEEK_DATA and >>> SEEK_HOLE don't seem to be currently supported under OpenBSD. Since >>> that's the home OS for OpenSSH this could create portability issues. >>> While you can get around that with the judicious use of defines it means >>> that the feature set will start to shift between different OSes. >> >> OpenBSD unfortunately does not implement so many other APIs. But other >> OS do implement SEEK_DATA+SEEK_HOLE, including FreeBSD, Linux, >> Solaris, Illumos and even Cygwin. Even NFS has a SEEK to lookup holes >> and data sections in files. >> SEEK_HOLE+SEEK_DATA are also now part of the POSIX standard, so IMO it >> is time to face the bug that sparse files are not handled correctly >> and fix it > > You and the others on this thread are IIRC the first people in sftp's > 24 year history to ever ask for sparse file support. Its absence is not > a bug and adding it will almost certainly require new protocol extensions.This is valid and one of the issues that I brought up with my co-developer when he brought this to my attention. I am not convinced it is needed except in a few corner cases as I tend to view SSH more as a transport mechanism than anything else. I don't see the need to replicate what is adequately handled by other tools. That said, I do think it's an interesting problem.> Being pushy with vounteer developers, telling us what our priorities > should be, assigning us work, etc. will not have the result you want.Personally, I'm not asking any of the developers to do this work. I really do apologize if it came across that way. I think it's kind of a big ask and adequately handled by the use of rsync in most user cases. It is something I *might* look at doing because I can see the value for people in my community of HPC science users (which is who I develop HPN-SSH for). However, I, like you, need to balance that against other development priorities. This is on my 'maybe' list but that's about it.> If you want this to happen, I recommend starting by figuring out what > protocol extensions need to be made, and how to support sparse files > on system without SEEK_DATA/HOLE - it should be pretty to do this on > upload without these flags and without extensions.Avoiding the use of SEEK_HOLE and SEEK_DATA makes sense from a portability perspective and, after looking at rsync, seems feasible. That said, handling this in terms of the protocol is important. I'm willing, at times, to bend the protocol but not break it. Chris
Ron Frederick
2025-Mar-29 04:45 UTC
Support for transferring sparse files via scp/sftp correctly?
On Mar 6, 2025, at 9:38?PM, Damien Miller <djm at mindrot.org> wrote:> If you want this to happen, I recommend starting by figuring out what > protocol extensions need to be made, and how to support sparse files > on system without SEEK_DATA/HOLE - it should be pretty to do this on > upload without these flags and without extensions.I was inspired by this thread to add sparse file support to AsyncSSH, on OSes that support SEEK_DATA and SEEK_HOLE. It looks like I should also be able to get this to work on Windows with FSCTL_QUERY_ALLOCATED_RANGES and FSCTL_SET_SPARSE, but I haven?t gotten to that yet. As Darren Tucker said, the put() operation here can be made to work with any SFTP server. However, an SFTP extension is required to support this for get() or copy(), or the case where the copy-data extension is used to copy data between files on a remote server without reading and writing it back over the wire. I?ve defined an extension called "ranges at asyncssh.com <mailto:ranges at asyncssh.com>? which is modeled somewhat after FXP_READDIR for getting valid data ranges in a remote file. Each call can return multiple ranges, but on files with a large number of ranges you may need call this new method multiple times to get the complete list. This allows for the copying to be interleaved with the range requests. The extension looks like the following: uint32 id -- Ron Frederick ronf at timeheart.net
Ron Frederick
2025-Mar-29 05:06 UTC
Support for transferring sparse files via scp/sftp correctly?
Sorry for the mis-send earlier. Here?s the complete message I meant to send. On Mar 6, 2025, at 9:38?PM, Damien Miller <djm at mindrot.org <mailto:djm at mindrot.org>> wrote:> If you want this to happen, I recommend starting by figuring out what > protocol extensions need to be made, and how to support sparse files > on system without SEEK_DATA/HOLE - it should be pretty to do this on > upload without these flags and without extensions.I was inspired by this thread to add sparse file support to AsyncSSH, on OSes that support SEEK_DATA and SEEK_HOLE. It looks like I should also be able to get this to work on Windows with FSCTL_QUERY_ALLOCATED_RANGES and FSCTL_SET_SPARSE, but I haven?t gotten to that yet. As Darren Tucker said, the put() operation here can be made to work with any SFTP server. However, an SFTP extension is required to support this for get() or copy(), or the case where the copy-data extension is used to copy data between files on a remote server without reading and writing it back over the wire. I?ve defined an extension called "ranges at asyncssh.com <mailto:ranges at asyncssh.com>? which is modeled somewhat after FXP_READDIR for getting valid data ranges in a remote file. Each call can return multiple ranges, but on files with a large number of ranges you may need send this request multiple times to get the complete list. This allows for the copying to be interleaved with getting back range responses. The request looks like the following: uint32 id string ?ranges at asyncssh.com <mailto:ranges at asyncssh.com>? string handle uint64 offset uint64 length This requests valid data ranges in the file associated with the request handle. The offset and length specify the portion of the file which the ranges should be returned for. The response looks like: uint32 id uint32 count repeats count times: uint64 offset uint64 length bool end-of-list [optional] The count specifies the number of ranges in the reply. After this is an optional bool which indicates whether there are any more valid data ranges in the request?s offset and length. If there are no entries at all within the request range, an FXP_STATUS of FX_EOF should be sent. If you don?t get all of the requested ranges in a single request, additional requests can be sent starting at just past the end of the last range previously returned. What do you think? -- Ron Frederick ronf at timeheart.net
Seemingly Similar Threads
- Support for transferring sparse files via scp/sftp correctly?
- Support for transferring sparse files via scp/sftp correctly?
- Support for transferring sparse files via scp/sftp correctly?
- Support for transferring sparse files via scp/sftp correctly?
- Building libsk-libfido2.so?