samba-bugs at samba.org
2018-Oct-05 17:34 UTC
[Bug 13645] New: Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 Bug ID: 13645 Summary: Improve efficiency when resuming transfer of large files Product: rsync Version: 3.0.9 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P5 Component: core Assignee: wayned at samba.org Reporter: pe1chl at amsat.org QA Contact: rsync-qa at samba.org When transferring large files over a slow network, we interrupt rsync at the beginning of business hours leaving the transfer unfinished. The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest When restarting the transfer, a lot of time is "wasted" while first the local system is reading the partially transferred file and sends the checksums to the remote, which only then starts to read the source file until it finds something to transfer. So nothing happens until 2 times the time required to read the partial transfer from the disks! When the partial file is many many GB, this can take hours. Suggestions: 1. when the source is larger than the destination, immediately begin to transfer from the offset in the source equal to the size of the destination. it is already known that this part will have to be transferred. 2. try to do the reading of the partial file at the destination and the same part of the source in parallel (so the time is halved), and preferably also in parallel to 1. Of course these optimizations (at least #2) may actually decrease performance when the transfer is local (not over slow network) and the disk read rate is negatively affected by reading at two different places in parallel. So #2 should only be attempted when the transfer is over a network. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2018-Oct-05 17:41 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #1 from Kevin Korb <rsync at sanitarium.net> --- If you are sure the file has not been changed since it was partially copied, see --append. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2018-Oct-05 17:50 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #2 from Rob Janssen <pe1chl at amsat.org> --- Thanks, that helps a lot for this particular use case. (the files are backups) -- You are receiving this mail because: You are the QA Contact for the bug.
L A Walsh
2018-Oct-12 22:15 UTC
[Bug 13645] New: Improve efficiency when resuming transfer of large files
If you are doing a local<-> local transfer, you are wasting time with checksums. You'll get faster performance with "--whole-file". Why do you stop it at night when you could 'unlimit' the transfer speed? Seems like when you aren't there would be best time to copy everything. Doing checksums will cause a noticeable impact to local-file transfers. On 10/5/2018 10:34 AM, just subscribed for rsync-qa from bugzilla via rsync wrote:> https://bugzilla.samba.org/show_bug.cgi?id=13645 > When transferring large files over a slow network, ... > The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest > > When restarting the transfer, a lot of time is "wasted" while first the local > system is reading the partially transferred file and sends the checksums to the remote, ... > > > Of course these optimizations (at least #2) may actually decrease performance > when the transfer is local (not over slow network) and the disk read rate is > negatively affected by reading at two different places in parallel. So #2 > should only be attempted when the transfer is over a network. >--- Or might decrease performance on a fast network. Not sure what you mean by 'slow' 10Mb? 100Mb -- not sure w/o measuring if it is faster or slower to do checksums, but I know at 1000Mb and 10Gb, checksums are prohibitively expensive. NOTE: you also might look at the protocol you use to do network transfers. I.e. use rsync over a locally mounted disk to a locally mounted network share, and make the network share a samba one. That way you will get parallelism automatically -- the file transfer cpu-time will happen inside of samba, while the local file gathering will happen in rsync. I regularly got ~ 119MB R/W over 1000Mb ethernet. BTW, Any place I use a power-of-2 unit like 'B' (Byte), I use the power-of-two base (1024) prefix, but if I use a singular unit like 'b' (bit), then I use decimal prefixes. Doing otherwise makes things hard to calculate and can introduce calculation inaccuracies.
samba-bugs at samba.org
2018-Nov-20 22:02 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 Wayne Davison <wayned at samba.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |WONTFIX --- Comment #3 from Wayne Davison <wayned at samba.org> --- Rsync is never going to assume that a file can be continued, as it doesn't know what the old data is compared to the source. You can tell rsync to assume that the early data is all fine by using --append, but that can cause you problems if any non-new files need an update that is not an append. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2018-Nov-21 08:59 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #4 from Rob Janssen <pe1chl at amsat.org> --- Ok you apparently did not understand what I proposed. However it is not that important as in our use case we can use --append. -- You are receiving this mail because: You are the QA Contact for the bug.