David Chisnall via llvm-dev
2021-Jan-13 11:11 UTC
[llvm-dev] [cfe-dev] RFC: Automated signing of release files
Hi Tom, It would be good to articulate what yo believe the benefit is here. Signing is generally a process of associating an identity with an artefact so that attestations can be made about that artefact. Git hashes are intrinsically signatures. They associate the artefact of the latest commit with the identity of the merkel tree that defines the history. As a result, they allow a user to validate that the code that they have is the same as some other repository. Someone can look at their local depth-1 checkout and validate that it is part of the history of the public repository. The simplest way for a user to get a cryptographic attestation that they have files that correspond to a revision in our git tree is to get a depth 1 checkout of the repo. This is currently 140MB of data to transfer with git. The extracted tree is 747MB including 149MB of git metadata. The xz-compressed tarball sizes are very different: 86MB without the git info, 230MB with, so there's a big size saving to be had by not including the git data. Presumably the goal here is to tie the hash of the tarball to a specific git revision with less overhead than including the full git state in the tarball. The core idea here is to allow folks that download the tarball to delegate verifying that it matches the git repo to some other entity. The simplest way for a user to do this is to grab the tarball over HTTPS directly from GitHub using a URL like: https://api.github.com/repos/llvm/llvm-project/tarball/0f59d099571d3d803b54e2ce06aa94babb9b26db This gives a 125MB tarball, so slightly smaller than a depth-1 git checkout (same git commit). GitHub provides a live attestation that this tarball corresponds to the specific revision. You can verify GitHub's TLS certificate to check the identity of the entity providing the attestation and so if you trust GitHub not to lie to you about something that's trivial to verify by doing a git clone then you have the guarantee. I assume; however, that your use case assumes *offline* verification. This gets more tricky because any offline verification of signatures also requires a revocation mechanism and policy. If our signing key is compromised and someone signs a load of tarballs of LLVM + malware as corresponding to a specific git revision that is publicly auditable and doesn't include malware then how do we revoke those signatures? This gets even more complicated once we start talking about binaries. Signatures of binaries are typically used to assert that a specific entity performed the build. Binary signatures of open source projects typically attempt to associate the identity of the binary with the identity of the specific source code from which the build was run. For the former to be useful, there needs to be some notion of identity for the folks doing the build. If your plan is for a individual community members to be able to upload builds and have them signed then what is the process going to be to authorise people to upload builds? There's a big difference between builds that we can produce from CI VMs that are initialised to a well-defined state before the build and builds run on a random developer's machine that may be compromised. For the signature to be useful for associating the build with a source revision, it needs to be verifiable, which means that the build needs to be reproducible. I believe LLVM does now support a reproducible build configuration, do all of the release snapshots build it? If someone runs a reproducible build and gets different output to the published sources, what is the revocation policy? Finally, what is the process for verifying the integrity of the binaries on the client? Normally this is something that's tightly coupled with the package management infrastructure. Windows MSIs, Linux RPMs and Debs, FreeBSD pkgs all use different kinds of signature (including different signing algorithms, different signature serialisation formats, and even different scopes of what is signed). Tarballs have no intrinsic signature mechanism and so would need to be checked by hand. Operationally: - If a user finds a signature mismatch, what does it tell them? - If we discover a malicious binary and need to revoke its signature, how do we do that? Without a lot more detail, I am opposed to adding generic signing infrastructure. It adds complexity and the perception of security. We need to *very* clearly establish the threat model and security guarantees that we think we are providing before we can discuss whether any given signature workflow actually achieves these guarantees. David On 13/01/2021 05:13, Tom Stellard via cfe-dev wrote:> Hi, > > I would like to automate the signing of some of the release files we > upload to the release page, starting with the source tarballs. My > initial goal is to have a CI job that automatically creates, signs, and > uploads the source tarballs, whenever a new release is tagged. I would > also like the key used for signing to be a 'project' key and not > someone's personal key. > > Once this is done, I would like to implement something similar for the > release binaries, so that testers could upload the binaries and have > them automatically signed. This will be more difficult than the source > tarballs, because the binaries are built by individual testers, so we > would need to prove that they come from a trust-worthy source. > > Implementing these changes, will help streamline the release process and > let release managers avoid doing a lot of manual mistake-prone tasks. > > The questions I have for the community are: > > Is this a good idea? > > How can I implement this securely? > > Thanks, > Tom > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >
Tom Stellard via llvm-dev
2021-Jan-14 18:35 UTC
[llvm-dev] [cfe-dev] RFC: Automated signing of release files
On 1/13/21 3:11 AM, David Chisnall via cfe-dev wrote:> Hi Tom, > > It would be good to articulate what yo believe the benefit is here. >Thanks for this reply, this is really helpful. To me the benefit of some kind of automatic signing would just be to save time and also avoid mistakes in the current manual process. It sounds like from this response and others that using GitHub actions or some other CI systems to do the signing may not be the best approach. I may just try to script more of what I'm currently during on my local machine as that will help provide some of the benefits I'm looking for.> Signing is generally a process of associating an identity with an > artefact so that attestations can be made about that artefact. > > Git hashes are intrinsically signatures. They associate the artefact of > the latest commit with the identity of the merkel tree that defines the > history. As a result, they allow a user to validate that the code that > they have is the same as some other repository. Someone can look at > their local depth-1 checkout and validate that it is part of the history > of the public repository. > > The simplest way for a user to get a cryptographic attestation that they > have files that correspond to a revision in our git tree is to get a > depth 1 checkout of the repo. This is currently 140MB of data to > transfer with git. The extracted tree is 747MB including 149MB of git > metadata. The xz-compressed tarball sizes are very different: 86MB > without the git info, 230MB with, so there's a big size saving to be had > by not including the git data. > > Presumably the goal here is to tie the hash of the tarball to a specific > git revision with less overhead than including the full git state in the > tarball. The core idea here is to allow folks that download the tarball > to delegate verifying that it matches the git repo to some other entity. > > The simplest way for a user to do this is to grab the tarball over HTTPS > directly from GitHub using a URL like: > > https://api.github.com/repos/llvm/llvm-project/tarball/0f59d099571d3d803b54e2ce06aa94babb9b26db >GitHub automatically adds a tarball to the release page for us: https://github.com/llvm/llvm-project/archive/llvmorg-11.0.1.tar.gz But, we stopped relying on this, because we had a user report that the tarball format was not stable, so you weren't guaranteed to get the exact same bits each time you download it. I'm not sure if this same issue affects the tarballs accessed by using a git commit hash as well. -Tom> > This gives a 125MB tarball, so slightly smaller than a depth-1 git > checkout (same git commit). GitHub provides a live attestation that > this tarball corresponds to the specific revision. You can verify > GitHub's TLS certificate to check the identity of the entity providing > the attestation and so if you trust GitHub not to lie to you about > something that's trivial to verify by doing a git clone then you have > the guarantee. > > I assume; however, that your use case assumes *offline* verification. > This gets more tricky because any offline verification of signatures > also requires a revocation mechanism and policy. If our signing key is > compromised and someone signs a load of tarballs of LLVM + malware as > corresponding to a specific git revision that is publicly auditable and > doesn't include malware then how do we revoke those signatures? > > This gets even more complicated once we start talking about binaries. > > Signatures of binaries are typically used to assert that a specific > entity performed the build. Binary signatures of open source projects > typically attempt to associate the identity of the binary with the > identity of the specific source code from which the build was run. > > For the former to be useful, there needs to be some notion of identity > for the folks doing the build. If your plan is for a individual > community members to be able to upload builds and have them signed then > what is the process going to be to authorise people to upload builds? > There's a big difference between builds that we can produce from CI VMs > that are initialised to a well-defined state before the build and builds > run on a random developer's machine that may be compromised. > > For the signature to be useful for associating the build with a source > revision, it needs to be verifiable, which means that the build needs to > be reproducible. I believe LLVM does now support a reproducible build > configuration, do all of the release snapshots build it? If someone > runs a reproducible build and gets different output to the published > sources, what is the revocation policy? > > Finally, what is the process for verifying the integrity of the binaries > on the client? Normally this is something that's tightly coupled with > the package management infrastructure. Windows MSIs, Linux RPMs and > Debs, FreeBSD pkgs all use different kinds of signature (including > different signing algorithms, different signature serialisation formats, > and even different scopes of what is signed). Tarballs have no > intrinsic signature mechanism and so would need to be checked by hand. > > Operationally: > > - If a user finds a signature mismatch, what does it tell them? > > - If we discover a malicious binary and need to revoke its signature, > how do we do that? > > Without a lot more detail, I am opposed to adding generic signing > infrastructure. It adds complexity and the perception of security. We > need to *very* clearly establish the threat model and security > guarantees that we think we are providing before we can discuss whether > any given signature workflow actually achieves these guarantees. > > David > > On 13/01/2021 05:13, Tom Stellard via cfe-dev wrote: >> Hi, >> >> I would like to automate the signing of some of the release files we >> upload to the release page, starting with the source tarballs. My >> initial goal is to have a CI job that automatically creates, signs, >> and uploads the source tarballs, whenever a new release is tagged. I >> would also like the key used for signing to be a 'project' key and not >> someone's personal key. >> >> Once this is done, I would like to implement something similar for the >> release binaries, so that testers could upload the binaries and have >> them automatically signed. This will be more difficult than the >> source tarballs, because the binaries are built by individual testers, >> so we would need to prove that they come from a trust-worthy source. >> >> Implementing these changes, will help streamline the release process >> and let release managers avoid doing a lot of manual mistake-prone tasks. >> >> The questions I have for the community are: >> >> Is this a good idea? >> >> How can I implement this securely? >> >> Thanks, >> Tom >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev