Nick McCarthy
2010-Aug-04 08:46 UTC
Optimising the Rsync algorithm for speed by reverting to MD4 hashing
Hi, From v3.0.0 onwards the hash function implemented by Rsync was changed from MD4 to MD5 (http://rsync.samba.org/ftp/rsync/src/rsync-3.0.0-NEWS). My understanding is that MD5 is a more secure, slower version of MD4 but I am not convinced that the added security of MD5 would alone have merited the change from MD4 (particularly since MD4 is ~30% faster than MD5). I wonder if I am missing other reasons which made the change necessary/desirable? I am looking at ways to optimise Rsync (for speed) hence my interest in this, Thanks, Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20100804/6551d02a/attachment.html>
andrew.marlow at uk.bnpparibas.com
2010-Aug-04 11:30 UTC
Optimising the Rsync algorithm for speed by reverting to MD4 hashing
I don't know why rsync made this move. My guess is that it does not look good for rsync to use a discredited algorithm. See http://tools.ietf.org/html/draft-turner-md4-to-historic-00. Creating secure hashing functions is notoriously difficult. Several times algorithms previously thought secure have been shown to be vunerable to certain attacks. MD5 has also been discovered to be vunerable. See the article "MD5 considered harmful today" at http://www.win.tue.nl/hashclash/rogue-ca. So the question is, does rsync need a hashing algorithm that is cryptographically secure? I suppose it's due in part to the likelyhood of different chunks hashing to the same value. With the MD5 vunerability one has to specially engineer it. IMO it is extremely unlikely that it would happen by chance when used by rsync. If anyone worries about this then maybe rsync would move to SHA-1 at some point. And then what if someone finds a problem with SHA-1? Indeed, Bruce Schneier has an article on this at http://www.schneier.com/blog/archives/2005/02/sha1_broken.html. Again, I reckon that the SHA-1 vunerability would have no practical effect if SHA-1 was used in rsync. Just my $0.02. rsync uses the hashing function to fingerprint the chunks. I do not see why this needs to have all the strengths and safeguards of a cryptographic algorithm. Unless rsync is supposed to be defending against protocol attack? Is it? I didn't think so but I could be wrong, I don't know enough about this bit of the rsync code. If it is trying to defend against this then IMO it should be using an HMAC rather than just a hash code. Assuming it doesn't need these strengths/safeguards then maybe it should use a cheaper (i.e. quicker) hashing algorithm. Regards, Andrew Marlow Internet Nick.McCarthy at replify.com Sent by: rsync-bounces at lists.samba.org 04/08/2010 09:46 To rsync at lists.samba.org cc Subject Optimising the Rsync algorithm for speed by reverting to MD4 hashing Hi,