Differences between revisions 5 and 6

Commit Signing Plan

1. Problem Statement

Mercurial should provide stronger guarantees about the authenticity of commits, including who made them and optionally who "signed off" on them. It should do so in a manner that scales to large repos with high commit velocity and doesn't lock users in to one specific workflow.

2. Background and Threat Models

2.1. Spoofable Author Field

Mercurial only has a single field for author information. It captures a name and email for the person or entity making the commit. However, any user can set any value for the author field. This opens up the possibility for spoofing.

A nefarious person could create a commit that appears to be coming from a well-known and trusted individual. This form of "social engineering attack" could result in a reviewer letting his or her guard down ("person X writes good code: I don't have to pay too much attention") and malicious code being inserted into a repository without deserved scrutiny.

The viability of this attack varies, as many workflows use tools with accounts and these tools commonly expose account info that could be used to reinforce the identity of the patch author/submitter.

2.2. Transport Level Patch Rewriting

Many projects use insecure transmission of patches (email) or third party hosting of commit data (e.g. pull requests). Even self-hosted Mercurial repositories are one exploit away from remote rewriting (the exploit could be in your OS or HTTP server).

A party with MITM capabilities, the ability to coerce a third party hosting provider (such as through a secret court order), or the ability to hack a server running a Mercurial server could alter the contents of commits between when the author wrote them and when a trusted party looks at them. This rewriting could introduce a vulnerability. This rewriting could potentially go unnoticed, as people tend to glance over things like exact SHA-1s.

Allowing patch authors to sign commits gives them confidence that tampering of their patches would be noticed.

2.3. Lack of Formal Sign-Off and Trust in Sign-Off

Mercurial currently doesn't formally record who "signed off" on a commit. Many projects have adopted a "two person rule" where any new commit requires at least 2 people: an author and a separate (trusted) person to sign off on it. Organizations like Mozilla have resorted to annotating commit messages with this metadata. e.g. "r=indygreg" (this means "positively reviewed by indygreg").

Anybody who can edit a commit message can add this metadata and create falisified entries.

A nefarious individual could construct a commit (message) that appears to have sign-off and then convince someone to land it.

A similar issue is one where a code author changes a commit before landing it. Someone giving sign-off may find themselves in a position where they gave sign-off, but the landed commit changed in a way that would have invalidated their sign-off. A more formalized method for verifying changes landed exactly as intended could prevent this.

2.4. Lack of Push Log

If a falsified commit gets introduced to the repository, it isn't always clear how it got there because the Mercurial server does not keep a formal log of this. This problem has more or less been solved by the pushlog extension. However, this data only establishes a paper trail: it doesn't provide proactive detection against falsified entries being introduced.

Mozilla's pushlog extension also has a weak point: single point of failure. The log is created on the server and can't be cryptographically proven. There is trust that the server is telling the truth.

3. GPG Extension and Its Limitations

Mercurial ships with a "gpg" extension that allows commits to be signed with GPG. This is done by:

Find the SHA-1 of a changeset to be signed
Produce a GPG signature of that SHA-1
Append the signature to the .hgsigs file
Commit the result

See http://selenic.com/repo/hg/rev/b09e5150bf8f for an example commit.

This is the only mechanism currently built in to Mercurial to establish a chain of trust for a commit.

There are some limitations with the existing extension.

First, it isn't practical to sign every commit to the repository. This is because every signing operation requires a new commit to record the added signature(s). This effectively means one extra commit per push operation. In practice, nobody takes this approach. Instead, only a small number of commits are signed. Commonly, it's only release commits or tag commits that are signed.

Second, commit signing isn't scalable for high commit volume workloads. Organizations like Facebook and Mozilla commit to repositories so frequently that there are "push races" to repositories and pushers typically need to rebase before pushing. Since a rebase would rewrite the commit's SHA-1, it would invalidate the GPG signature and require re-signing. This would require one of the following to overcome:

Signers would need to take responsibility for pushing commits they sign off on.
A different entity would have to re-sign the commits.

#1 may be unacceptable to some organizations and workflows, as it effectively requires that the person doing sign-off is the person pushing changesets. There is overhead here.

#2 would break the chain of trust from the original signer. It undermines the purpose of signed commits in the first place and is thus an inadequate solution for people wanting signed commits for trust chain verification.

4. Proposed Solution: In-Commit Signing

The issues of extra commits and rebasing losing signatures can be worked around by introducing a new method of commit signing.

Instead of signing the SHA-1 of the commit (which is derived from the content of all files in the repository at the time of the commit (the manifest), all ancestor commits, and fields like date, author, and commit message from the commit itself), we will sign a hash covering just the changes in the commit. This signature will be added to commits themselves such that signing doesn't require additional commits.

Generically, the process for signing a changeset is thus:

Build a representation of the changes made in a changeset
Hash that representation
Sign the hash
Add hash and signature to an extra field in the changeset and commit the amended result

The process for verifying a changeset is thus:

Build a representation of the changes made by a changeset
Hash that representation
Verify that hash matches what was signed
Verify the signature of the hash is valid

4.1. Creating Representations of Commits

The method for creating the representation of commits for hashing/signing will be based on the built-in changelog text generation and changelog hashing but with the following changes:

Full manifest node will be omitted
Extra fields belonging to the fields used to hold signatures will be omitted
Parent changesets will not be included

In the absence of the full manifest node (which is a representation of the state of every file in the commit), we will construct a partial manifest consisting of just the files changed by the commit. We will need to include an explicit list of deleted files, since these aren't explicitly captured by manifests. e.g.

mercurial/hg.py 23cc12f225f1b42f32dc0d897a4f95a38ddc8f4a
mercurial/deleted.py 217bc3fde6d82c0210cf56aeae11d05a03f35b2b d

The representation and hash of a commit is thus stable as long as the following conditions are met:

The commit date, author, message, branch, and any other fields stored in "extra" (exluding those use to hold signatures) are not modified
The commit is rebased and no file merges occur (end state of files modified by the commit does not change)

The representation and hash of a commit is thus conveying the commit metadata and end state of files changed by the commit (as opposed to commit data, all parent commits, and end state of all files in the repo at the time of the commit). The produced hash and signature sacrifices some details to achieve flexibility and usability.

4.2. Storing Signatures

Signatures will be stored in "extra" fields as part of the changeset. The following fields will be added:

The Commit-Hash field will store a representation of the commit. (Bikeshed over name is needed.)

The Author-Signature field will capture a signature from the author of the commit. Mercurial will verify that the key used to produce this signature matches the author field in the commit. This field and signature can be used to verify that commits are coming from the person the author field says they are coming from, thus preventing spoofing in the author field. This field is arguably not as important as establishing trust for sign-off.

The Sign-Off-Signature-N fields (where N is an integer) will hold signatures from people signing off on the commit. These signatures can be used to verify that a trusted person reviewed the change and that the change landed exactly as the reviewer intended. We'll start at count 0 or 1 and append new signatures to the end as they arrive. All Sign-Off-Signature-* fields will be ignored when computing the representation of a commit for in-commit signing. This allows signatures to be added or removed without requiring re-signing.

4.3. Concerns and Open Issues

Like existing full-commit signatures, in-commit signatures could still get invalidated in a lot of workflows. If a file-level merge occurs on rebase, the signature becomes invalid. If the commit message changes, the signature becomes invalid. This may cause excessive churn and require re-signs. Security or convenience: pick one.

AFAIK, ctx.files() is never validated to be accurate. A malicious person could defeat signing by producing representations of commits that were incomplete. We may need to walk manifests or verify ctx.files() is accurate as part of accepting changesets (behavior behind "server.validate" perhaps?). This could have performance implications.

We may want to store additional metadata about how the representation of and the hash used for a signature is generated. This will allow us to change how things work in the future while still maintaining backwards compatibility.

Should we allow different sets of data to be captured in the commit representation/hash/signature? e.g. should we allow signatures to encapsulate just changed files and e.g. not the commit date or commit message?

Adding signatures to commits would rewrite the commit and invalidate the previous SHA-1. That requires a lot of rebasing. Security or convenience: pick one.

-  ⇤ ← Revision 5 as of 2015-04-03 19:48:19 → 
  Size: 9878
  Editor: GregorySzorc
  Comment: note about commit rewriting
+   ← Revision 6 as of 2015-04-03 21:56:37 → ⇥
  Size: 10995
  Editor: GregorySzorc
  Comment: split up section to document threat models
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-Mercurial  should provide stronger guarantees about the authenticity of commits,  including who made them and optionally who "signed off" on them. It should do so in a manner that scales to large repos and with commit velocity.
+Mercurial  should provide stronger guarantees about the authenticity of commits,  including who made them and optionally who "signed off" on them. It should do so in a manner that scales to large repos with high commit velocity and doesn't lock users in to one specific workflow.
 Line 5:
-== Background and Issues ==
Mercurial  only has a single field for author information. It captures a name and  email for the person or entity making the commit. However, any user can  set any value for the author field. This opens up the possibility for  spoofing. A nefarious person could create a commit that appears to be coming from a well-known and trusted individual. This form of "social engineering attack" could result in a reviewer letting his or her guard down ("person X writes good code: I don't have to pay too much attention") and malicious code being inserted into a repository unnoticed. The viability of this attack varies, as many workflows use tools with accounts and these tools almost certainly expose account info that could be used to reinforce the identity of the patch author/submitter. But, Mercurial should arguably not have to rely on supplemental tools for commit verification.
+== Background and Threat Models ==
=== Spoofable Author Field ===
Mercurial  only has a single field for author information. It captures a name and  email for the person or entity making the commit. However, any user can  set any value for the author field. This opens up the possibility for  spoofing.
-Line 8:
+Line 9:
-Mercurial currently doesn't formally record who "signed off" on a commit. Many projects have adopted a "two person rule" where any new commit requires at least 2 people: an author and a separate (trusted) person to sign off on it. Organizations like Mozilla have resorted to annotating commit messages with this metadata. e.g. "r=indygreg" (this means "positively reviewed by indygreg"). Of course, anybody who can edit a commit message can add this metadata and create falisified entries. A nefarious individual could construct a commit that appears to have sign-off and then convince someone to land it.
+A nefarious person could create a commit that appears to be coming from a well-known and trusted individual. This form of "social engineering attack" could result in a reviewer letting his or her guard down ("person X writes good code: I don't have to pay too much attention") and malicious code being inserted into a repository without deserved scrutiny.

The viability of this attack varies, as many workflows use tools with accounts and these tools commonly expose account info that could be used to reinforce the identity of the patch author/submitter.

=== Transport Level Patch Rewriting ===
Many projects use insecure transmission of patches (email) or third party hosting of commit data (e.g. pull requests). Even self-hosted Mercurial repositories are one exploit away from remote rewriting (the exploit could be in your OS or HTTP server).

A party with MITM capabilities, the ability to coerce a third party hosting provider (such as through a secret court order), or the ability to hack a server running a Mercurial server could alter the contents of commits between when the author wrote them and when a trusted party looks at them. This rewriting could introduce a vulnerability. This rewriting could potentially go unnoticed, as people tend to glance over things like exact SHA-1s.

Allowing patch authors to sign commits gives them confidence that tampering of their patches would be noticed.

=== Lack of Formal Sign-Off and Trust in Sign-Off ===
Mercurial currently doesn't formally record who "signed off" on a commit. Many projects have adopted a "two person rule" where any new commit requires at least 2 people: an author and a separate (trusted) person to sign off on it. Organizations like Mozilla have resorted to annotating commit messages with this metadata. e.g. "r=indygreg" (this means "positively reviewed by indygreg").

Anybody who can edit a commit message can add this metadata and create falisified entries.

A nefarious individual could construct a commit (message) that appears to have sign-off and then convince someone to land it.
-Line 12:
+Line 29:
-Along that vein is the issue of recording who pushed what where. If a falsified commit gets introduced to the repository, it isn't always clear how it got there because the Mercurial server does not keep a formal log of this. This problem has more or less been solved by the [[https://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/pushlog.html|pushlog]] extension. However, this data only establishes a paper trail: it doesn't prevent falsified entries from being introduced.
+=== Lack of Push Log ===
If a falsified commit gets introduced to the repository, it isn't always clear how it got there because the Mercurial server does not keep a formal log of this. This problem has more or less been solved by the [[https://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/pushlog.html|pushlog]] extension. However, this data only establishes a paper trail: it doesn't provide proactive detection against falsified entries being introduced.

Mozilla's pushlog extension also has a weak point: single point of failure. The log is created on the server and can't be cryptographically proven. There is trust that the server is telling the truth.

Diff for "CommitSigningPlan"