Differences between revisions 1 and 37 (spanning 36 versions)
Revision 1 as of 2005-08-26 00:57:52
Size: 7279
Editor: waste
Comment:
Revision 37 as of 2013-08-26 14:31:30
Size: 487
Editor: Aaron80B
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Mercurial's design =

The guts of Mercurial.

== Data structures ==

Mercurial uses a few fundamental objects.

=== Nodeids ===

Nodeids are unique ids that represent the contents of a file /and/ its
position in the project history. For now they are computed using the
SHA1 hash function, which generates 160 bits (40 hex digits). If you
modify a file, commit the change, and then modify it to restore the
original contents, the contents are the same but the history is
different, so the file will get a new nodeid. This
history-sensitivity is obtained by calculating the nodeid from the
concatentation of the parent nodeids with the file's contents.

=== Revlogs ===

A '''revlog''', for example ''.hg/data/somefile.d'', is the most important
data structure and represents all versions of a file. Each version is
stored compressed in its entirety or stored as a compressed binary
delta (difference) relative to the preceeding version in the revlog.
Whether to store a full version is decided by how much data would be
needed to reconstruct the file. This system ensures that Mercurial
does not need huge amounts of data to reconstruct any version of a
file, no matter how many versions are stored.

The reconstruction requires a single read, if Mercurial knows when and
where to read. Each revlog therefore has an '''index''', for example
''.hg/data/somefile.i'', containing one fixed-size record for each
version. The record contains:

 * the nodeid of the file version
 * the nodeids of its parents
 * how many bytes to read from the revlog
 * the offset in the revlog saying where to begin reading

With one read of the index to fetch the record and then one read of
the revlog, Mercurial can in time proportional to the file size
reconstruct any version of a file.

So that adding a new version requires only O(1) seeks, the revlogs and
their indices are append-only.

Revlogs are also used for '''manifests''' and '''changesets'''.

=== Manifests ===

A manifest describes the state of a project by listing each file and
its nodeid to specify which version. Recreating a particular state
means simply looking up its manifest and reconstructing the listed
file versions from their revlogs. The manifest is conceptually a
file. All of its versions, which collectively represent the entire
project history -- are stored in a revlog (see the file
''.hg/00manifest.d'') and an associated index (''.hg/00manfesti.i'').

=== Changesets ===

A changeset lists all files changed in a checkin along with a change
description and metadata like user and date. It also contains a
nodeid of the resulting manifest. Given enough history information, a
changeset can be converted into a manifest. Given enough history
information and the metadata, a manifest can be converted into a
changeset. So manifests are reundant, but maintaining both
representations of the project history speeds up Mercurial.

== Merging ==

These data structures are designed for '''merging''', the fundamental
operation in a distributed SCM that encourages branching.

=== Graph merging ===

Merging a pair of directed acyclic graphs (D``AGs) -- the family tree of
the file history -- requires determining whether nodes in different
graphs correspond. Comparing the node contents (or
hashes of the contents) is incorrect because it ignores the history.

However, using the nodeid avoids this error because the nodeid
describes the file's contents and its graph position relative to the
root. A merge simply checks whether each nodeid in graph A is in
graph B and vice versa (for example using a hash table), and Mercurial
adds the new nodes to the append-only revlog.

=== Branching and merging ===

Every working directory is potentially a branch and every user
effectively works in their own branch. When Mercurial checks out a
branch, it remembers the changeset that directly led to it so that the
next checkin will have the correct parent.

To merge two branches, you check out their heads into the same working
directory, which also performs a merge, and then check in (commit) the
result once you're happy with the merge. The resulting checkin has
two parents.

Mercurial decides when a merge is necessary by first determining
whether the working directory contains uncommitted changes. This
determination effectively turns the working directory into a branch of
the checked-in version on which it is based. If the working directory
is a direct ancestor or descendent of the second version that we're
attempting to checkout, Mercurial replaces the working-directory
version with the new version. Otherwise it merges the two versions.

=== Merging manifests ===

To merge manifests, first compare them and decide which files need to
be added, deleted, and merged.

For each file to be merged, perform a graph merge and resolve
conflicts as above. It's important to merge files using per-file D``AGs
rather than just changeset-level D``AGs as this diagram illustrates:

{{{
 M M1 M2
}}}

{{{
   AB
    |`-------v M2 clones M (mainline)
   aB AB file A is change in mainline
    |`---v AB' file B is changed in M2
    | aB / | M1 clones M
    | ab/ | M1 changes B
    | ab' | M1 merges from M2, changes to B conflict
    | | A'B' M2 changes A
     `---+--.|
         | a'B' M2 merges from mainline, changes to A conflict
         `--.|
            ??? depending on which ancestor we choose, we will
                   have to redo A hand-merge, B hand-merge, or both
                   but if we look at the files independently, the
                   merge is easy
}}}

The result is a merged version in the working directory, waiting for
checkin.

=== Rollback ===

When committing or merging, Mercurial adds the changeset entry last.
Mercurial keeps a transaction log of the name of each file touched and
its length prior to the transaction. On abort, it truncates each file
to its prior length. This simplicity is one benefit of making revlogs
append-only. The transaction journal also allows an '''undo''' operation.

=== Merging between repositories ===

A key feature of Mercurial is its ability to merge between independent
repositories in a decentralized fashion. Each repository can act as a
read-only server or as a client. A client pulls from the server all
branches that it has not seen and adds them to its graph. This pull
is done in two steps:

/1. Searching for new roots/. This part begins by finding all new heads
and searching backwards from those heads to the first unknown nodes in
their respective branches. These nodes are the roots used to
calculate the '''changegroup''': the set of all changesets starting at
those roots. Mercurial takes pains to make this search efficient in
both bandwidth and round-trips.

/2. Pulling a changegroup/. Once the roots are found, the changegroup
can be transferred as a single streaming transfer. This transfer is
organized as an ordered set of deltas for changesets, manifests, and
files. Large chunks of deltas can be directly added to the repository
without unpacking so the pull is quick.
Greetings. Allow me begin by [[http://en.wikipedia.org/wiki/Storytelling|telling]] you the author's identify - Matt though it is not his start identify. Delaware is his start place but he demands to go for the reason that of his family members. Because he was eighteen he is been working as a receptionist. One of his beloved hobbies is bungee leaping but he cannot make it his profession. You can constantly find his web-site in this article: http://www.yahad38.co.il/wordpress/?p=725

Greetings. Allow me begin by telling you the author's identify - Matt though it is not his start identify. Delaware is his start place but he demands to go for the reason that of his family members. Because he was eighteen he is been working as a receptionist. One of his beloved hobbies is bungee leaping but he cannot make it his profession. You can constantly find his web-site in this article: http://www.yahad38.co.il/wordpress/?p=725

Design (last edited 2013-10-08 10:26:58 by MattiJagula)