samba-bugs at samba.org
2017-Jan-19  08:05 UTC
[Bug 12530] New: [REQ] Improve fuzzy using files being uploaded
https://bugzilla.samba.org/show_bug.cgi?id=12530
            Bug ID: 12530
           Summary: [REQ] Improve fuzzy using files being uploaded
           Product: rsync
           Version: 3.1.2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: wayned at samba.org
          Reporter: ben.rubson at gmail.com
        QA Contact: rsync-qa at samba.org
Hello,
Let's imagine the sender is uploading a bunch of files which are quite
similar.
For example, the following dir :
/directory
|-backup1.iso
|-backup2.iso
|-backup3.iso
|-backup4.iso
|-backup5.iso
For the moment, if no remote fuzzy basis is found at the very beginning of the
transfer, every file will be fully uploaded.
Goal would then be to improve rsync so that once the first file has been
uploaded, fuzzy algorithm could look at this new file as a fuzzy basis file for
the other new files arriving. Same thing once the second file has been uploaded
etc...
Perhaps it could be done once for all at the very beginning of the transfer,
also taking the list of files which will be uploaded (sent by the sender), and
their properties, to feed the fuzzy algorithm.
This would speed-up transfer in a number of situations.
Thank you very much !
Ben
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2023-Oct-17  16:21 UTC
[Bug 12530] [REQ] Improve fuzzy using files being uploaded
https://bugzilla.samba.org/show_bug.cgi?id=12530
Ulrich Sibilller <ulrich.sibiller at atos.net> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ulrich.sibiller at atos.net
--- Comment #1 from Ulrich Sibilller <ulrich.sibiller at atos.net> ---
I go one step further than this: rsync should not only look for a file to
reference in fuzzy mode but also take into account what it transferred
previously. So instead of throwing away any information it gathered for the
first file once it is done it could keep the transfer information and reuse it.
It would then
a) automatically fulfill you request by having the information for the first
iso already
b) not rely on similarity by size and/or name only but on the data itself!
Of course this would increase memory usage but that's something the user can
decide if it is worth or not.
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Reasonably Related Threads
- Windows Trouble with --link-dest set: "file not found" when rsync tries to create hard link
- [Bug 12498] New: --fuzzy --fuzzy hugely impacts performance even if its' not needed
- [Bug 12489] New: --fuzzy --fuzzy does not work with daemon
- [Bug 12527] New: Sender waits for timeout when fuzzy basis file found
- [PATCH] tests: move ntfs tests in a single directory