On Feb 9, 2017, at 1:26 PM, Leonard den Ottolander <leonard at
den.ottolander.nl> wrote:>
> On Thu, 2017-02-09 at 14:12 -0600, Johnny Hughes wrote:
>> The patch files are in git as text files, right? Why would you need
>> checksums of those? That is the purpose of git, right?
>
> Checksums are there to make sure that you get what you are supposed to
> get.
What failure model are you trying to solve for, specifically?
If you?re worried about malicious tampering of the files on the server, how
would your request solve anything? If you don?t trust the Git repo you?re
cloning from, why would you trust a checksum file stored in that same repo?
If you?re worried about a MITM attack, any MITM that can modify Git data
in-flight can produce bogus checksum files in-flight, too.
If you?re worried about corrupted data at rest on the remote server or
corruption introduced during the transfer, Git already solves this:
https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
If you want to verify that a given Git clone is consistent:
$ git fsck --full ?strict
Git can do this because its contents are a type of Merkle tree:
https://en.wikipedia.org/wiki/Merkle_tree
Merkle trees are highly resistant to attacks, particularly in the case of source
code, where an attack must not only change the attacked resource, the change has
to a) create some effect desired by the attacker; and b) still be legal code in
the programming language being used. Getting both effects while still
maintaining the same SHA1 hash is Difficult.?
I don?t know Git internals, but I would expect the above git-fsck command to be
pointless immediately after a clone, because Git should be doing something like
what it does during the clone process. (I?ve been disappointed by Git?s
behavior before, though, so?)
That command should only have a useful effect after a later git pull command in
order to detect whether the local copy has bitrotted in the meantime.
> Having checksums for all files (like in a SRPM) is a guarantee
A checksum guarantees nothing by itself. A file?s checksum is only as
trustworthy as the source of that checksum. If you don?t trust the source to
give you a correct file, you can?t trust that same source to give you a valid
checksum. Any bad actor that can compromise one can compromise the other.
*Distributed* checksums can sometimes be helpful, if they?re maintained by
disparate parties on distributed servers. Here, you?re asking some third party
to assert that they got a copy of the same RPM (or whatever) and that they got
checksum XXXXXXX for it. That devolves into a trust relationship, rather than
the math problem it naively looks like: do you trust that party not to be
compromised by the same party that produced the RPM in question?
Another trust problem ? which is again a people problem rather than a math
problem ? is cryptographic signatures. A signed SRPM is only as trustworthy as
the provider of the signing certificate. Certificate authorities are getting
caught doing untrustworthy things *all the time*. Have you vetted your trusted
CAs, or are you relying on a third party to do that? Why do you trust that
third party to do that job thoroughly?