thr3ads.net - rsync - [librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum

If this information is useful, please help other people find it:
Share via:

Donovan Baarda

2004-Apr-08 05:51 UTC

[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

On Thu, 2004-04-08 at 12:36, Martin Pool wrote:> On  5 Apr 2004, Donovan Baarda <abo@minkirri.apana.org.au> wrote:
> 
> > librsync needs a whole file checksum. Without it, it silently fails
for
> > case 1), 3), and 4).
> 
> Yes, a whole-file checksum should be used with it.  Presumably
> something stronger than md4 like SHA-1.
md4 is probably good enough for most applications. I know it's not as
secure as others, but when you take into account the requirement that
the signature match as well, compromising it becomes much more
complicated.
> I think the only question is whether this should be done internally in
> librsync, or as a separate process.  I can see arguments either way.  
>
> In some cases you might prefer to actually store an signed signature
> using something like GPG.
Yeah... good point. The rdiff program should probably use a whole file
md4sum though.
 > > librsync could benefit from a random checksum_seed. It would need to
be
> > included in the signature. Without it librsync is vulnerable to cases
1)
> > and 3).
> 
> Random with respect to what?  I think it would be nice if repeatedly
> summing identicaly files gave identical signatures.  Maybe it can vary
> depending on only the input data...
The problem is repeatability is what makes it vulnerable. If the content
fully determines the signature, the content can be crafted to produce a
particular signature. It is relatively easy to calculate two different
blocks with the same blocksum if the checksum_seed is known.

librsync could be modified to detect when two blocks had the same
blocksum in a signature, and permutate the checksum seed in a
deterministic way to try repeatedly until a signature without collisions
is found. This would give repeatable signatures and protect against case
1), but not case 3). It would also allow crafted files that force m/2
recalculations of the signature.

I think I've just realised what you were getting at; if the
checksum_seed is based on something like the whole file md4sum, it
becomes repeatable, but unpredictable. You can't manipulate individual
blocks without it affecting every other blocksum, but the signature for
the same file is always the same. Nice :-)

This would fit in nicely with adding a whole-file checksum to the
signature... generate the wholefile md4sum, and use that as the starting
"seed" for the individual blocksums. The wholefile sum becomes the
seed.

This doesn't allow for "permutating" the seed if it happens to
result in
blocksum clashes. However, given that the probability if this happening
is pretty astronomical (when using the wholefile sum seed) and will be
caught by a whole-file checksum miss-match, do we care? It is far more
likely to get false block matches when calculating the delta (because
sums are calculated at every byte boundary, not just at block
boundaries).

More thoughts on using the wholefile sum as the seed; at first I thought
this would still be vulnerable to case 3). Using a any single block as
the original file and trying to find any block that matches means you
still have "birthday algorithm" numbers of blocks to check (ie
2^(n/2)).
However, each block "comparison" requires the recalculation of the
"target" blocksum using the "original" blocksum as the seed,
resulting
in non-birthday algorithm number of checksum calculations (ie, 2^n).

I've Cc'd this to the rsync list because they might get some ideas from
it.

-- 
Donovan Baarda <abo@minkirri.apana.org.au>
http://minkirri.apana.org.au/~abo/

Eran Tromer

2004-Apr-08 10:47 UTC

head link

[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

On 2004/04/08 08:50, Donovan Baarda wrote:>>In some cases you might prefer to actually store an signed signature
>>using something like GPG.
I think librsync should act as a black box that guarantees file
integrity (which, apparently, requires a whole file checksum). If
someone wants to add authentication or encryption or whatever on top of
that, all the merrier, but let that be done in addition to rsync's own
integrity check.

That means both librsync and (say) GPG would compute a whole file hash.
The communication cost of this duplication is negligible (a typical hash
is much smaller than a typical digital signature). The computation is a
bit more annoying -- it will roughly double the CPU consumption of the
receiver. That can be solved that by revealing the whole file checksum
(when available) via the API, so that librsync users can directly sign
that checksum instead of computing their own.

> if the
> checksum_seed is based on something like the whole file md4sum, it
> becomes repeatable, but unpredictable. You can't manipulate individual
> blocks without it affecting every other blocksum, but the signature for
> the same file is always the same. Nice :-)
Nice indeed, but the cost is enormous: you'll have to read the file
twice. When syncing a mostly-unchanged file that's larger than the disk
cache, that means doubling the runtime (and disk load) on the receiver's
side. Also, it means 'rdiff signature' and equivalents won't work on
arbitrary streams. Currently the receiver can tee the output of 'rdiff
patch' into both the output file and an instance of 'rdiff
signature',
in order to save the IO of re-reading the file upon the next sync; this
won't work anymore. (How about a built-in option for that, BTW?).

> More thoughts on using the wholefile sum as the seed; at first I thought
> this would still be vulnerable to case 3). Using a any single block as
> the original file and trying to find any block that matches means you
> still have "birthday algorithm" numbers of blocks to check (ie
2^(n/2)).
> However, each block "comparison" requires the recalculation of
the
> "target" blocksum using the "original" blocksum as the
seed, resulting
> in non-birthday algorithm number of checksum calculations (ie, 2^n).
I'm afraid it's still vulnerable to case 3 (a pair of "target"
and
"original" files with matching blocks). For simplicity consider
single-block files. In this case what you've done is simply to replace
the hash function
  f(x) = truncate(MD4(x,fixed_seed))
with the hash function
  f'(x) = truncate(MD4(x,MD4(x,fixed_seed)))
which happens to be the same as hashing two copies of x in sequence.
But the birthday attack doesn't care what's the hash function; it only
cares about its input and output sizes (which we didn't change) and that
the function is "sufficiently random".

  Eran

Wayne Davison

2004-Apr-08 16:55 UTC

head link

[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

On Thu, Apr 08, 2004 at 03:50:48PM +1000, Donovan Baarda
wrote:> I think I've just realised what you were getting at; if the
> checksum_seed is based on something like the whole file md4sum, it
> becomes repeatable, but unpredictable.
Not so.  Copy the file once, and you'd get all the data you'd need to
create a new local file using a known-signature attack (as long as the
input file didn't change, and that's easy to predict for many files).
I also don't like the doubling of the I/O-cost on the sending side, so
I don't think this is a good way to go.

..wayne..

Maybe Matching Threads

Search for more maybe matching threads

rsync - Apr 2004 - [librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

Maybe Matching Threads