samba-bugs at samba.org
2018-Oct-05 17:34 UTC
[Bug 13645] New: Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645
Bug ID: 13645
Summary: Improve efficiency when resuming transfer of large
files
Product: rsync
Version: 3.0.9
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P5
Component: core
Assignee: wayned at samba.org
Reporter: pe1chl at amsat.org
QA Contact: rsync-qa at samba.org
When transferring large files over a slow network, we interrupt rsync at the
beginning of business hours leaving the transfer unfinished.
The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest
When restarting the transfer, a lot of time is "wasted" while first
the local
system is reading the partially transferred file and sends the checksums to the
remote, which only then starts to read the source file until it finds something
to transfer. So nothing happens until 2 times the time required to read the
partial transfer from the disks! When the partial file is many many GB, this
can take hours.
Suggestions:
1. when the source is larger than the destination, immediately begin to
transfer from the offset in the source equal to the size of the destination.
it is already known that this part will have to be transferred.
2. try to do the reading of the partial file at the destination and the same
part of the source in parallel (so the time is halved), and preferably also in
parallel to 1.
Of course these optimizations (at least #2) may actually decrease performance
when the transfer is local (not over slow network) and the disk read rate is
negatively affected by reading at two different places in parallel. So #2
should only be attempted when the transfer is over a network.
--
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2018-Oct-05 17:41 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #1 from Kevin Korb <rsync at sanitarium.net> --- If you are sure the file has not been changed since it was partially copied, see --append. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2018-Oct-05 17:50 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #2 from Rob Janssen <pe1chl at amsat.org> --- Thanks, that helps a lot for this particular use case. (the files are backups) -- You are receiving this mail because: You are the QA Contact for the bug.
L A Walsh
2018-Oct-12 22:15 UTC
[Bug 13645] New: Improve efficiency when resuming transfer of large files
If you are doing a local<-> local transfer, you are wasting time with checksums. You'll get faster performance with "--whole-file". Why do you stop it at night when you could 'unlimit' the transfer speed? Seems like when you aren't there would be best time to copy everything. Doing checksums will cause a noticeable impact to local-file transfers. On 10/5/2018 10:34 AM, just subscribed for rsync-qa from bugzilla via rsync wrote:> https://bugzilla.samba.org/show_bug.cgi?id=13645 > When transferring large files over a slow network, ... > The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest > > When restarting the transfer, a lot of time is "wasted" while first the local > system is reading the partially transferred file and sends the checksums to the remote, ... > > > Of course these optimizations (at least #2) may actually decrease performance > when the transfer is local (not over slow network) and the disk read rate is > negatively affected by reading at two different places in parallel. So #2 > should only be attempted when the transfer is over a network. >--- Or might decrease performance on a fast network. Not sure what you mean by 'slow' 10Mb? 100Mb -- not sure w/o measuring if it is faster or slower to do checksums, but I know at 1000Mb and 10Gb, checksums are prohibitively expensive. NOTE: you also might look at the protocol you use to do network transfers. I.e. use rsync over a locally mounted disk to a locally mounted network share, and make the network share a samba one. That way you will get parallelism automatically -- the file transfer cpu-time will happen inside of samba, while the local file gathering will happen in rsync. I regularly got ~ 119MB R/W over 1000Mb ethernet. BTW, Any place I use a power-of-2 unit like 'B' (Byte), I use the power-of-two base (1024) prefix, but if I use a singular unit like 'b' (bit), then I use decimal prefixes. Doing otherwise makes things hard to calculate and can introduce calculation inaccuracies.
samba-bugs at samba.org
2018-Nov-20 22:02 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645
Wayne Davison <wayned at samba.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |WONTFIX
--- Comment #3 from Wayne Davison <wayned at samba.org> ---
Rsync is never going to assume that a file can be continued, as it doesn't
know
what the old data is compared to the source. You can tell rsync to assume that
the early data is all fine by using --append, but that can cause you problems
if any non-new files need an update that is not an append.
--
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2018-Nov-21 08:59 UTC
[Bug 13645] Improve efficiency when resuming transfer of large files
https://bugzilla.samba.org/show_bug.cgi?id=13645 --- Comment #4 from Rob Janssen <pe1chl at amsat.org> --- Ok you apparently did not understand what I proposed. However it is not that important as in our use case we can use --append. -- You are receiving this mail because: You are the QA Contact for the bug.