Diff for "Design" - Mercurial

Differences between revisions 21 and 22

(for details from a source code point of view, see WhatGoesWhere)

Data structures

Nodeids

Nodeids are unique ids that represent the contents of a file and its position in the project history. For now they are computed using the SHA1 hash function, which generates 160 bits (40 hex digits). If you modify a file, commit the change, and then modify it to restore the original contents, the contents are the same but the history is different, so the file will get a new nodeid. This history-sensitivity is obtained by calculating the nodeid from the concatentation of the parent nodeids with the file's contents.

Nodeids are typically presented to the user as shortened hex strings, like this:

$ hg id
8d43f8c0b836 tip

The nodeid 00000... is special and is known as the nullid. It serves as the empty root revision. This has the nice property that otherwise unrelated revisions have a common empty ancestor.

Revlogs

A revlog, for example .hg/data/somefile.d, is the most important data structure and represents all versions of a file. Each version is stored compressed in its entirety or stored as a compressed binary delta (difference) relative to the preceeding version in the revlog. Whether to store a full version is decided by how much data would be needed to reconstruct the file. This system ensures that Mercurial does not need huge amounts of data to reconstruct any version of a file, no matter how many versions are stored.

The reconstruction requires a single read, if Mercurial knows when and where to read. Each revlog therefore has an index, for example .hg/data/somefile.i, containing one fixed-size record for each version. The record contains:

the nodeid of the file version
the nodeids of its parents
the length of the revision data
the offset in the revlog saying where to begin reading
the base of the delta chain
the linkrev pointing to the corresponding changeset

Here's an example:

$ hg debugindex .hg/data/README.i
   rev    offset  length   base linkrev nodeid       p1           p2
     0         0    1125      0       0 80b6e76643dc 000000000000 000000000000
     1      1125     268      0       1 d6f755337615 80b6e76643dc 000000000000
     2      1393      49      0      27 96d3ee574f69 d6f755337615 000000000000
     3      1442     349      0      63 8e5de3bb5d58 96d3ee574f69 000000000000
     4      1791      55      0      67 ed9a629889be 8e5de3bb5d58 000000000000
     5      1846     100      0      81 b7ac2f914f9b ed9a629889be 000000000000
     6      1946     405      0     160 1d528b9318aa b7ac2f914f9b 000000000000
     7      2351      39      0     176 2a612f851a95 1d528b9318aa 000000000000
     8      2390       0      0     178 95fdb2f5e08c 2a612f851a95 2a612f851a95
     9      2390     127      0     179 fc5dc12f851b 95fdb2f5e08c 000000000000
    10      2517       0      0     182 24104c3ccac4 fc5dc12f851b fc5dc12f851b
    11      2517     470      0     204 cc286a25cf37 24104c3ccac4 000000000000
    12      2987     346      0     205 ffe871632da6 cc286a25cf37 000000000000
...

With one read of the index to fetch the record and then one read of the revlog, Mercurial can reconstruct any version of a file in time proportional to the file size.

So that adding a new version requires only O(1) seeks, the revlogs and their indices are append-only.

Revlogs are also used for manifests and changesets.

Manifests

A manifest describes the state of a project by listing each file and its nodeid to specify which version. Recreating a particular state means simply looking up its manifest and reconstructing the listed file versions from their revlogs. The manifest is conceptually a file. All of its versions, which collectively represent the entire project history -- are stored in a revlog (see the file .hg/00manifest.d) and an associated index (.hg/00manifest.i).

A manifest looks something like this:

$ hg manifest
a240bca3a32747c68c413e090d94fb8d8b7945d1 644 .hgignore
64903d6bc1681739840d3180b984fff3526001ab 644 .hgtags
ea694a28b9b10803b7708cdf43f56f2589078b52 644 CONTRIBUTORS
5ac863e17c7035f1d11828d848fb2ca450d89794 644 COPYING
246219e9cd134000b910f65f395f3d60f66719b9 644 MANIFEST.in
e81018fed6bda3460c37dfc046d6601b10dbb201 644 Makefile
615e66a1a3b9580a06834fdd3470d1fc58d3bde5 644 PKG-INFO
a903ec460ca828fa8b7c4ca8f3abb97e6697457d 644 README
b0166ba7a8977756db92a88634da162844af978f 644 comparison.txt
b5bb77bceabd6932eacb0dae5a060dee0b6c70d3 644 contrib/bash_completion
5ca6d79f9a1d5e328de9adc65bc850ab771cf927 755 contrib/buildrpm
cc9ffe9f086735501b0baa207360d4478fd491d5 755 contrib/convert-repo
78f7c038716f258f451528e1e8241d895419f2ee 644 contrib/git-viz/git-cat-file
78f7c038716f258f451528e1e8241d895419f2ee 644 contrib/git-viz/git-diff-tree
78f7c038716f258f451528e1e8241d895419f2ee 644 contrib/git-viz/git-rev-list
78f7c038716f258f451528e1e8241d895419f2ee 644 contrib/git-viz/git-rev-tree
b58aa4c0ea58bb671532171ac4a2cd5957661531 644 contrib/git-viz/hg-viz
9d5ecbd3177e2cd88b9202e4fc27c60d410eae25 755 contrib/hgit
e46eccf6b286bbfeb0547f4940ada1aa5f6305ab 755 contrib/hgk
...

Changesets

A changeset lists all files changed in a checkin along with a change description and metadata like user and date. It also contains a nodeid of the associated manifest. Changesets are also stored in revlog (see the file .hg/00changelog.d) and an associated index (.hg/00changelog.i).

Here's what the internal representation of a changeset looks like:

$ hg debugdata .hg/00changelog.d 1208
1102691ceab8c8f278edecd80f2e3916090082dd <- the corresponding manifest hash
mpm@selenic.com <- the committer
1126146623 25200 <- the date, in seconds since the epoch, and seconds offset from UTC
mercurial/commands.py <- the list of changed files, followed by the commit message

Clean up local clone file list 

We now use an explicit list of files to copy during clone so that we
don't copy anything we shouldn't.

Putting it all together

We now have enough information to see how this all works together. To look up a given revision of a file:

look up the changeset in the changeset index
reconstruct the changeset data from the revlog
look up the manifest nodeid from the changeset in the manifest index
reconstruct the manifest for that revision
find the nodeid for the file in that revision
look up the revision of that file in the file's index
reconstruct the file revision from the revlog

If we want to go the other way and find the changeset associated with a given file revision, we follow the linkrev.

   .--------linkrev-------------.
   v                            |
.---------.    .--------.    .--------.
|changeset| .->|manifest| .->|file    |---.
|index    | |  |index   | |  |index   |   |--.
`---------' |  `--------' |  `--------'   |  |
    |       |      |      |     | `-------'  |
    V       |      V      |     V    `-------'
.---------. |  .--------. |  .---------.
|changeset|-'  |manifest|-'  |file     |
|data     |    |data    |    |revision |
`---------'    `--------'    `---------'

Tracking Working Directory State

The other piece of Mercurial is the working directory. Mercurial tracks various information about the working directory:

what revision(s) are currently checked out
what files have been copied or renamed
what files are controlled by Mercurial

For each file that Mercurial controls, we record the following information:

its size
its mode
its modification time
its "state"

The states that are tracked are:

n - normal
a - added
r - removed
m - 3-way merged

With this information, we can quickly determine what files in the working directory have changed.

Here's an example of the dirstate:

$ hg parents -q
1224:cc61d366bc3b
$ hg debugstate
n 644        168 08/19/05 17:42:17 .hgignore
n 644        412 08/20/05 01:46:57 .hgtags
n 644       1328 08/26/05 23:22:20 CONTRIBUTORS
n 644      17992 06/30/05 10:19:51 COPYING
n 644        459 08/24/05 00:38:20 MANIFEST.in
n 644        512 08/24/05 00:35:02 Makefile
n 644        232 06/30/05 10:19:51 PKG-INFO
n 644       2736 08/20/05 00:48:33 README
n 644       1676 06/30/05 10:19:51 comparison.txt
n 644       3711 12/31/69 15:59:59 contrib/bash_completion
n 711       1305 07/01/05 15:04:52 contrib/buildrpm
n 755       8300 08/16/05 16:03:59 contrib/convert-repo
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-cat-file
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-diff-tree
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-rev-list
n 644         69 06/30/05 10:19:51 contrib/git-viz/git-rev-tree
n 644        457 06/30/05 10:19:51 contrib/git-viz/hg-viz
n 755       8039 08/16/05 16:03:59 contrib/hgit
n 755      40043 06/30/05 10:19:51 contrib/hgk
...

Merging

Branching and merging

Every working directory is potentially a branch and every user effectively works in their own branch. When Mercurial checks out a branch, it remembers the changeset that directly led to it so that the next checkin will have the correct parent.

To merge two branches, you check out their heads into the same working directory, which also performs a merge, and then check in (commit) the result once you're happy with the merge. The resulting checkin has two parents.

Mercurial decides when a merge is necessary by first determining whether the working directory contains uncommitted changes. This determination effectively turns the working directory into a branch of the checked-in version on which it is based. If the working directory is a direct ancestor or descendent of the second version that we're attempting to checkout, Mercurial replaces the working-directory version with the new version. Otherwise it merges the two versions.

Graph merging

Merging a pair of directed acyclic graphs (DAGs) -- the family tree of the file history -- requires determining whether nodes in different graphs correspond. Comparing the node contents (or hashes of the contents) is incorrect because it ignores the history.

However, using the nodeid avoids this error because the nodeid describes the file's contents and its graph position relative to the root. A merge simply checks whether each nodeid in graph A is in graph B and vice versa (for example using a hash table), and Mercurial adds the new nodes to the append-only revlog.

Merging manifests

To merge manifests, first compare them and decide which files need to be added, deleted, and merged.

For each file to be merged, perform a graph merge and resolve conflicts as above. It's important to merge files using per-file DAGs rather than just changeset-level DAGs as this diagram illustrates:

 M   M1   M2

   AB
    |`-------v     M2 clones M (mainline)
   aB       AB     file A is change in mainline
    |`---v  AB'    file B is changed in M2
    |   aB / |     M1 clones M
    |   ab/  |     M1 changes B
    |   ab'  |     M1 merges from M2, changes to B conflict
    |    |  A'B'   M2 changes A
     `---+--.|
         |  a'B'   M2 merges from mainline, changes to A conflict
         `--.|
            ???    depending on which ancestor we choose, we will
                   have to redo A hand-merge, B hand-merge, or both
                   but if we look at the files independently, the
                   merge is easy

The result is a merged version in the working directory, waiting for checkin.

Rollback

When committing or merging, Mercurial adds the changeset entry last. Mercurial keeps a transaction log of the name of each file touched and its length prior to the transaction. On abort, it truncates each file to its prior length. This simplicity is one benefit of making revlogs append-only. The transaction journal also allows an undo operation.

Merging between repositories

A key feature of Mercurial is its ability to merge between independent repositories in a decentralized fashion. Each repository can act as a read-only server or as a client. A client pulls from the server all branches that it has not seen and adds them to its graph. This pull is done in two steps:

Searching for new roots.
- This part begins by finding all new heads and searching backwards from those heads to the first unknown nodes in their respective branches. These nodes are the roots used to calculate the changegroup: the set of all changesets starting at those roots. Mercurial takes pains to make this search efficient in both bandwidth and round-trips.
Pulling a changegroup.
- Once the roots are found, the changegroup can be transferred as a single streaming transfer. This transfer is organized as an ordered set of deltas for changesets, manifests, and files. Large chunks of deltas can be directly added to the repository without unpacking so the pull is quick.

See WireProtocol for more details.

-  ⇤ ← Revision 21 as of 2007-02-17 02:04:25 → 
  Size: 15446
  Editor: GoldieWuestenberg
  Comment:
+   ← Revision 22 as of 2007-02-17 10:02:29 → ⇥
  Size: 13330
  Editor: ThomasArendsenHein
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 131:
 Clean up local clone file list
 Line 309:
- [http://lister.100freemb.com/29.html webcam girls stripping] | [http://brickworks.ibnsites.com/30.html jeune webcam lesbian] | [http://lister.100freemb.com/89.html sexy webcam strip] | [http://magmas.1accesshost.com/82.html shemale asians domination] | [http://markab.dreamstation.com/40.html deep kissing hiv] | [http://bobcat.741.com/48.html kissing your pussy] | [http://spoonfed.freewebsitehosting.com/16.html innocent teen cum] | [http://incipiency.envy.nu/13.html transexual pic] | [http://handed.dreamstation.com/70.html porn bisexual] | [http://hemostatic.freewebpages.org/83.html young naked virgins] | [http://lister.100freemb.com/67.html hot webcam directory] | [http://centimes.100freemb.com/86.html gay live webcams] | [http://suffixed.100freemb.com/13.html florida live webcam] | [http://ctenophore.angelcities.com/65.html lesbian interracial toy 1] | [http://reforming.ibnsites.com/59.html hot webcam milf] | [http://subheading.g0g.net/94.html lesbian panty lickers] | [http://suffixed.100freemb.com/80.html mature dildo mpg] | [http://wriggle.1sweethost.com/92.html sleeping with grandma] | [http://markab.dreamstation.com/80.html girls on webcam] | [http://drygoods.9cy.com/40.html free unmonitored webcams] | [http://rehabbing.1sweethost.com/50.html xxx webcam] | [http://travesties.fcpages.com/86.html creative webcam nx] | [http://lance.ibnsites.com/54.html myrtle beach webcams] | [http://prescient.envy.nu/65.html ejaculation busty] | [http://buckshots.741.com/42.html lost webcam nudes] | [http://blaspheme.00freehost.com/8.html white girlsblack dick] | [http://cementa.kogaryu.com/80.html katie price vid] | [http://suffixed.100freemb.com/54.html bukkake free] | [http://tobit.100freemb.com/5.html facial extremes] | [http://homepage.mac.com/tormenting/52.html red head sybian] | [http://homepage.mac.com/buskined/36.html teen shemale sisters] | [http://homepage.mac.com/muscular1/67.html arse breed] | [http://suffixed.100freemb.com/46.html lactating female] | [http://homepage.mac.com/roadies/96.html masturbating in boxers] | [http://landward.freecities.com/14.html internal ejaculation]