Philipp Marek
2025-Mar-04 10:34 UTC
Support for transferring sparse files via scp/sftp correctly?
> Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are > holes skipped and not transferred as chunks of 0 bytes? [1] > > We're asking about sparse files in the >= 1PB range, which consists of > multi-TB holes with around 600-2000GB of valid data.Perhaps rsync would be a good fit here, it supports --sparse.
Lionel Cons
2025-Mar-04 10:57 UTC
Support for transferring sparse files via scp/sftp correctly?
On Tue, 4 Mar 2025 at 11:34, Philipp Marek <philipp at marek.priv.at> wrote:> > > Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are > > holes skipped and not transferred as chunks of 0 bytes? [1] > > > > We're asking about sparse files in the >= 1PB range, which consists of > > multi-TB holes with around 600-2000GB of valid data. > > Perhaps rsync would be a good fit here, > it supports --sparse.No, if we would use external tools then mounting NFSv4.2 filesystem via https://github.com/kofemann/ms-nfs41-client/blob/master/cygwin/utils/sshnfs/sshnfs.ksh would be the tool of choice. I'm talking about NATIVE sparse file support in scp/sftp via SEEK_DATA/SEEK_HOLE on POSIX (https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html) or Windows FSCTL_QUERY_ALLOCATED_RANGES (https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_query_allocated_ranges). We need that because people have tried to copy sparse file before, and it either RUINED the files (by making them non-sparse, the holes were filled-in with 0 byte data), or copies took forever because the holes in sparse files are very large (e.g. multi TB size). Lionel
Chris Rapier
2025-Mar-04 20:17 UTC
Support for transferring sparse files via scp/sftp correctly?
On 3/4/25 05:34, Philipp Marek via openssh-unix-dev wrote:>> Does OpenSSH scp/sftp mode transfer sparse files correctly, i.e. are >> holes skipped and not transferred as chunks of 0 bytes? [1] >> >> We're asking about sparse files in the >= 1PB range, which consists of >> multi-TB holes with around 600-2000GB of valid data. > > > Perhaps rsync would be a good fit here, > it supports --sparse. > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > https://lists.mindrot.org/mailman/listinfo/openssh-unix-devI think one of the issues you are going to face is that SEEK_DATA and SEEK_HOLE don't seem to be currently supported under OpenBSD. Since that's the home OS for OpenSSH this could create portability issues. While you can get around that with the judicious use of defines it means that the feature set will start to shift between different OSes. Personally I think it's a good idea and I may explore it for HPN-SSH but I think it's going to be a hard sell for the OpenBSD community. Chris p.s. sorry for not replying correctly. My mail seems to be having issues and this is the only email in the thread I've seen.