Patrick Pollen
2013-Nov-12 05:04 UTC
Need hint for my question regarding the working of rsync.
Hello, Since English is my second language, forgive me for any typing errors. I have been studying rsync for my academic project. I learned quite a lot but I need little help. My question is, does rsync sends all checksum of a file at once or in batches. For example suppose I have a 2 GB file, so after generating checksum on receiver side, does the receiver sends all the generated checksum to the sender at once?. I hope somebody will help me providing some hint. Awaiting for reply. Thanks. -Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20131112/b420cd61/attachment.html>
Wayne Davison
2013-Nov-12 21:50 UTC
Need hint for my question regarding the working of rsync.
On Mon, Nov 11, 2013 at 9:04 PM, Patrick Pollen <patrickpollen2 at gmail.com>wrote:> For example suppose I have a 2 GB file, so after generating checksum on > receiver side, does the receiver sends all the generated checksum to the > sender at once? >Yes, the receiver sends all the checksums that it generates at once (it doesn't keep any of them around, so it sends them in a stream as they are generated). These are both a weak and a strong checksum for each chunk of the file from start to finish. The sending side puts all these checksums into a hash before it starts to read its file, so it doesn't do any transfer work until after the receiver has completely finished all its checksum generation/transmission for the file. For really big files it would be interesting to amend this rule to one where the sending side waits only long enough for a certain number of checksums to arrive before it begins its work (and perhaps pauses if it gets too far ahead of the arriving checksums). This would allow it to get starting with its transfer work much sooner when sending a large file, with the only reduction in matched chunks being any matches that it might have made in data much deeper into the file. If it were also changed to drop early checksums as new ones arrived (perhaps prioritizing removal by a LRU algorithm), it could even avoid the need to make its chunk size super-sized (since large chunks are harder to find matches for). Such an algorithm would be more akin to transferring each really big file as if it were a series of smaller files, but perhaps with some MRU chunk sharing tossed in. Of course, the downside to such a change would be to increase the number of checksums generated per file (as chunk size decreases, chunk count increases), but with the reduction in pausing it might work out better overall. It would be interesting to try some tuning ideas such as that. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20131112/058d9c8d/attachment.html>