(for details from a source code point of view, see WhatGoesWhere)

TableOfContents

Data structures

Nodeids are unique ids that represent the contents of a file and its position in the project history. See ["Nodeid"].

A revlog, for example .hg/data/somefile.d, is the most important data structure and represents all versions of a file. See ["Revlog"].

A manifest describes the state of a project by listing each file and its nodeid to specify which version. See ["Manifest"].

A changeset lists all files changed in a checkin along with a change description and metadata like user and date. See ["ChangeSet"].

Putting it all together

We now have enough information to see how this all works together. To look up a given revision of a file:

If we want to go the other way and find the changeset associated with a given file revision, we follow the linkrev.

   .--------linkrev-------------.
   v                            |
.---------.    .--------.    .--------.
|changeset| .->|manifest| .->|file    |---.
|index    | |  |index   | |  |index   |   |--.
`---------' |  `--------' |  `--------'   |  |
    |       |      |      |     | `-------'  |
    V       |      V      |     V    `-------'
.---------. |  .--------. |  .---------.
|changeset|-'  |manifest|-'  |file     |
|data     |    |data    |    |revision |
`---------'    `--------'    `---------'

Tracking Working Directory State

The other piece of Mercurial is the working directory. Mercurial tracks various information about the working directory:

For each file that Mercurial controls, we record the following information:

The states that are tracked are:

With this information, we can quickly determine what files in the working directory have changed.

Here's an example of the dirstate:

$ hg parents -q
1224:cc61d366bc3b
$ hg debugstate
n 644        168 08/19/05 17:42:17 .hgignore
n 644        412 08/20/05 01:46:57 .hgtags
n 644       1328 08/26/05 23:22:20 CONTRIBUTORS
n 644      17992 06/30/05 10:19:51 COPYING
n 644        459 08/24/05 00:38:20 MANIFEST.in
n 644        512 08/24/05 00:35:02 Makefile
n 644        232 06/30/05 10:19:51 PKG-INFO
n 644       2736 08/20/05 00:48:33 README
n 644       1676 06/30/05 10:19:51 comparison.txt
n 644       3711 12/31/69 15:59:59 contrib/bash_completion
n 711       1305 07/01/05 15:04:52 contrib/buildrpm
n 755       8300 08/16/05 16:03:59 contrib/convert-repo
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-cat-file
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-diff-tree
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-rev-list
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-rev-tree
n 644        457 06/30/05 10:19:51 contrib/git-viz/hg-viz
n 755       8039 08/16/05 16:03:59 contrib/hgit
n 755      40043 06/30/05 10:19:51 contrib/hgk
...

Merging

Branching and merging

Every working directory is potentially a branch and every user effectively works in their own branch. When Mercurial checks out a branch, it remembers the changeset that directly led to it so that the next checkin will have the correct parent.

To merge two branches, you pull their heads into the same repository, check out one of them and merge the other, and then check in (commit) the result once you're happy with the merge. The resulting checkin has two parents.

Mercurial decides when a merge is necessary by first determining whether the working directory contains uncommitted changes. This determination effectively turns the working directory into a branch of the checked-in version on which it is based. If the working directory is a direct ancestor or descendent of the second version that we're attempting to checkout, Mercurial replaces the working-directory version with the new version. Otherwise it merges the two versions.

Graph merging

Merging a pair of directed acyclic graphs (DAGs) -- the family tree of the file history -- requires determining whether nodes in different graphs correspond. Comparing the node contents (or hashes of the contents) is incorrect because it ignores the history.

However, using the nodeid avoids this error because the nodeid describes the file's contents and its graph position relative to the root. A merge simply checks whether each nodeid in graph A is in graph B and vice versa (for example using a hash table), and Mercurial adds the new nodes to the append-only revlog.

Merging manifests

To merge manifests, first compare them and decide which files need to be added, deleted, and merged.

For each file to be merged, perform a graph merge and resolve conflicts as above. It's important to merge files using per-file DAGs rather than just changeset-level DAGs as this diagram illustrates:

 M   M1   M2

   AB
    |`-------v     M2 clones M (mainline)
   aB       AB     file A is change in mainline
    |`---v  AB'    file B is changed in M2
    |   aB / |     M1 clones M
    |   ab/  |     M1 changes B
    |   ab'  |     M1 merges from M2, changes to B conflict
    |    |  A'B'   M2 changes A
     `---+--.|
         |  a'B'   M2 merges from mainline, changes to A conflict
         `--.|
            ???    depending on which ancestor we choose, we will
                   have to redo A hand-merge, B hand-merge, or both
                   but if we look at the files independently, the
                   merge is easy

The result is a merged version in the working directory, waiting for checkin.

Rollback

When committing or merging, Mercurial adds the changeset entry last. Mercurial keeps a transaction log of the name of each file touched and its length prior to the transaction. On abort, it truncates each file to its prior length. This simplicity is one benefit of making revlogs append-only. The transaction journal also allows an undo operation.

Merging between repositories

A key feature of Mercurial is its ability to merge between independent repositories in a decentralized fashion. Each repository can act as a read-only server or as a client. A client pulls from the server all branches that it has not seen and adds them to its graph. This pull is done in two steps:

  1. Searching for new roots.
    • This part begins by finding all new heads and searching backwards from those heads to the first unknown nodes in their respective branches. These nodes are the roots used to calculate the changegroup: the set of all changesets starting at those roots. Mercurial takes pains to make this search efficient in both bandwidth and round-trips.

  2. Pulling a changegroup.
    • Once the roots are found, the changegroup can be transferred as a single streaming transfer. This transfer is organized as an ordered set of deltas for changesets, manifests, and files. Large chunks of deltas can be directly added to the repository without unpacking so the pull is quick.

See WireProtocol for more details.

References

see also: ["Presentations"]