At $work I have an odd situation involving incomplete file transfers, but I am unsure where the issue may be occurring. Here is the scenario. Problem: Sometimes the file transfer seems to have completed, but the file size does not match that on the remote system. Details: I transfer a number of large (>1GB) Tar-Gzipped (.tgz) files via SSH tunnels from $customer. Because of some previous issues, sometimes the SSH tunnels may be terminated externally. As a result, I am currently using the 'split' command to break the files into 1-GB "chunks" (ex.: foo.tgz.aa, foo.tgz.ab, ...). For the rsync transfer, I am using the following options: rsync -az \ -e "ssh ..." \ --link-dest=/local/path1 \ --link-dest=/local/path2 \ --remove-source-files \ user at remote:/path/to/files \ /local/path1/ where '-e "ssh ..."' is the set of SSH options (for tunneling, etc.). '--link-dest=/local/path1' refers to a local directory that might contain a copy of the file. '--link-dest=/local/path2' refers to a local directory that might contain a copy of the file. I am frequently encountering times where the file appears to have been transferred but is incomplete. (Example: foo.tgz.ab now exists on the local system, has been removed from the remote, but is incomplete.) Additional notes: To my knowledge I do not know if the 'gzip' '--rsyncable' option is being used (but I do not think so--I suspect the file is created using a command similar to 'tar czf foo.tgz ...'). The rsync commands may be launched from command-line or cron, but use the same format and options in either case. As a result, there may be multiple rsync processes pulling files from the same remote path to the same local path. I know that when rsync transfers a file (ex.: foo.tgz.ab) that during the transfer process it is named '.foo.tgz.ab.??????' (where '.??????' is a 6-character unique extension), and that upon completion the file is renamed to 'foo.tgz.ab'. (So I may see .foo.tgz.ab.4e67d0 and .foo.tgz.ab.fa7325 in the directory while the transfers are going.) I am unsure if this is a result of the combination of options I am using, or where to begin troubleshooting. Any guidance or direction would be appreciated. -Albert C.
Francis.Montagnac at inria.fr
2023-Mar-04 07:38 UTC
Trying to diagnose incomplete file transfer
Hi. On Sat, 04 Mar 2023 00:39:52 -0600 Albert Croft via rsync wrote:> The rsync commands may be launched from command-line or cron, but use > the same format and options in either case. As a result, there may be > multiple rsync processes pulling files from the same remote path to the > same local path.I think you should first prevent this to happen. If your receiving machine is using systemd: - define a X.service for doing the rsync - define a X.timer unit to replace using cron - launch from the command line with: systemctl start X - this will not start a new rsync if one runs already (if X.service is running) -- francis
Albert Croft via rsync <acroft at lists.samba.org> wrote:> ... I am currently using the 'split' command to break the files > into 1-GB "chunks" (ex.: foo.tgz.aa, foo.tgz.ab, ...). > ... > I am frequently encountering times where the file appears to > have been transferred but is incomplete. (Example: foo.tgz.ab > now exists on the local system, has been removed from the remote, > but is incomplete.)One thing to check, not in rsync itself but in the preparation of the data: not all versions of "split" support files that aren't text. In particular, some will silently drop null bytes.
I think it's very hard to be sure what's going on with --remove-source-files ; I think you should drop that option, look for whether the problem continues, and if you need the files to be cleaned up, do so in a separate step. In particular as someone else suggested, are you *sure* the original copies of the partial files are actually the size you think they are? I don't ever recall seeing a case where rsync just failed to transfer a file and thought it had succeeded; there pretty much has to be something else going on. On Sat, Mar 04, 2023 at 12:39:52AM -0600, Albert Croft via rsync wrote:> At $work I have an odd situation involving incomplete file transfers, but I > am unsure where the issue may be occurring. Here is the scenario. > > Problem: > Sometimes the file transfer seems to have completed, but the file size does > not match that on the remote system. > > > Details: > I transfer a number of large (>1GB) Tar-Gzipped (.tgz) files via SSH tunnels > from $customer. Because of some previous issues, sometimes the SSH tunnels > may be terminated externally. As a result, I am currently using the 'split' > command to break the files into 1-GB "chunks" (ex.: foo.tgz.aa, foo.tgz.ab, > ...). > > For the rsync transfer, I am using the following options: > rsync -az \ > -e "ssh ..." \ > --link-dest=/local/path1 \ > --link-dest=/local/path2 \ > --remove-source-files \ > user at remote:/path/to/files \ > /local/path1/ > > where > '-e "ssh ..."' is the set of SSH options (for tunneling, etc.). > '--link-dest=/local/path1' refers to a local directory that might contain a > copy of the file. > '--link-dest=/local/path2' refers to a local directory that might contain a > copy of the file. > > I am frequently encountering times where the file appears to have been > transferred but is incomplete. (Example: foo.tgz.ab now exists on the local > system, has been removed from the remote, but is incomplete.) > > > Additional notes: > To my knowledge I do not know if the 'gzip' '--rsyncable' option is being > used (but I do not think so--I suspect the file is created using a command > similar to 'tar czf foo.tgz ...'). > > The rsync commands may be launched from command-line or cron, but use the > same format and options in either case. As a result, there may be multiple > rsync processes pulling files from the same remote path to the same local > path. > > I know that when rsync transfers a file (ex.: foo.tgz.ab) that during the > transfer process it is named '.foo.tgz.ab.??????' (where '.??????' is a > 6-character unique extension), and that upon completion the file is renamed > to 'foo.tgz.ab'. (So I may see .foo.tgz.ab.4e67d0 and .foo.tgz.ab.fa7325 in > the directory while the transfers are going.) > > > I am unsure if this is a result of the combination of options I am using, or > where to begin troubleshooting. Any guidance or direction would be > appreciated. > > -Albert C. > > > > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html