Hi all, I am writing a xdelta-like application as a personal experiment and am busy implementing the rsync protocol, so far so good. I am using C++ templates and creating the algorithms so that operate on any stream, array, etc. through iterators. All seems well except that I am getting a lot of false hits with the weak checksum. When generating checksums of blocksize 1024 on the RedHat 7.1 ISO I generate about 760 000 checksums which go into a hash_multimap. When running the rolling checksums on the RedHat 7.2 ISO (against the checksums in the hash_map) I am getting almost 95% false hits. (ie the weak-checksums match, but when comparing the offsets the comparison fails). Is this expected? Is there a stronger rolling checksum I could implement? It takes about 80seconds to generate all the checksums on a 650MB file, thus I could certainly use a stronger algorithm which cause less false hits. Any ideas would be greatly appreciated. Regards, Stephan Buys PS please reply directly to me as I am not on the mailing list.
The weak checksum in checksum.c (see snippet below) differs substantially from the one discussed in Andrew Tridgell's doctoral thesis on rsync and elsewhere that I've been able to find. I didn't find discussion of the change in the mailing list archives. Well, so I'm curious what the benefit of the change is. (I'm using a rolling checksum in another app, thus the reason for this nitty inspection.) Can anyone provide a pointer to help me understand why this changed? Thanks much. In checksum.c: uint32 get_checksum1(char *buf1,int len) { int i; uint32 s1, s2; schar *buf = (schar *)buf1; s1 = s2 = 0; for (i = 0; i < (len-4); i+=4) { s2 += 4*(s1 + buf[i]) + 3*buf[i+1] + 2*buf[i+2] + buf[i+3] + 10*CHAR_OFFSET; s1 += (buf[i+0] + buf[i+1] + buf[i+2] + buf[i+3] + 4*CHAR_OFFSET); } for (; i < len; i++) { s1 += (buf[i]+CHAR_OFFSET); s2 += s1; } return (s1 & 0xffff) + (s2 << 16); } What I understood from documentation: /* Rolling checksum from Tridgell's PhD thesis. */ int get_weak_checksum(signed char *buf, int len) { int s; int s1, s2; int len4; s1 = s2 = 0; len4 = len / 4; for(i = 0; i < len - 4; i += 4) { s = (buf[i] << 24) + (buf[i+1] << 16) + (buf[i+2] << 8) + (buf[i+3]); s1 += s; s2 += (len4 - 4 * i) * s; } return (s2 << 16) + (s1 & 0xffff); } -- Jeff Jeff Abrahamson <http://www.purple.com/jeff/>