Hasanat Kazmi
2009-Sep-07 22:56 UTC
Optimizing RSync algorithm using techniques with Google used in courgette
Hi,I am student at LUMS SSE (http://cs.lums.edu.pk) and an active RSync user. Just few days ago, Google wrote about Courgette*: an algorithm which is specially written for syncing executables. By using Courgette, Google made diff size 1/10th of previous techniques used. I was wondering if this (or something on same lines) can be used to optimize RSync? I am senior and have to do a project. I am thinking to implement this in RSync. I need input from developers. What do you guys think? * http://dev.chromium.org/developers/design-documents/software-updates-courgette Hasanat Kazmi -- Hasanatkazmi at gmail.com Hasanat Kazmi +923464362473 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20090908/f8fa514c/attachment.html>
Shachar Shemesh
2009-Sep-08 12:27 UTC
Optimizing RSync algorithm using techniques with Google used in courgette
Hasanat Kazmi wrote:> Hi, > I am student at LUMS SSE (http://cs.lums.edu.pk) and an active RSync user. > Just few days ago, Google wrote about Courgette*: an algorithm which > is specially written for syncing executables. By using Courgette, > Google made diff size 1/10th of previous techniques used. > I was wondering if this (or something on same lines) can be used to > optimize RSync? I am senior and have to do a project. I am thinking to > implement this in RSync. I need input from developers. What do you > guys think? > > *http://dev.chromium.org/developers/design-documents/software-updates-courgette > > Hasanat Kazmi >Hi Hasanat, Like you said in the subject, this is an optimization. A format specific optimization. In other words, it uses a known property of the file being synchronized in order to make the diff size smaller. If you were to try to use the courgette pre-processing on something which is not an executable, you would have gotten significantly worse results than merely running rsync. At the moment, for better or for worse, rsync does not do format specific optimizations. As long as that is the case, rsync cannot be optimized using this algorithm. Even if we (and by "we", I mean Wayne, or anyone else brabe enough to pick this task up) were to implement such a functionality, I can think of quite a few things that would have a lot more to gain than executables. In particular, something that would uncompress both source and destination, and apply the rsync algorithm to both files, and then make sure the recompression of the target produces the exact same result would, IMHO, be much more useful than the change you are suggesting. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20090908/296a5007/attachment.html>