Lazy Fetching is the idea that you can only pull the revision history that is relevant to your own local changes. This cuts down on bandwidth and disk usage, and makes it easier for new developers. They can simply get a copy of the code, add a patch, and push that patch to others, or have them pull it. Normally you have to request all revisions (or changesets) all the way back to the very first one ("very first" being a relatively arbitrary distinction), before you can commit new ones. It's pretty important that you not be required to do so, especially for very large repositories with long histories.

My concept of a repository is a linked list of revisions, each one composing the contents of the working directory at a particular moment in time. Changesets come in because for small changes, having only the information needed to transition between revisions takes up a lot less space than having both revisions. So for this article at least, a revision will be a full copy of all checked in files, and a changeset will be a diff between two revisions.

Example repository:

If you have revision A, and all the changesets, then by applying them one at a time you can get to any of the revisions.

If you have revision A, revision B, and B->C, then by applying B->C to B you can get revision C.

If you have revision A, and A->C, but not anything about revision B, you can still get revision C by applying A->C to revision A.

If you have revisions A and C, you already have revision C, so you could check out either one without applying any changesets. B can't be checked out until you have either A->B or C->B.

It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that the revision D is identical to the revision C, but the changesets to reach it, A->C and B->C are not equivalent. Each would be a different diff, with a different hash. Changesets B->E and B->A would be the same, though in that case E would never exist, since you'd just use existing revision A.

So this is a more accurate summation of the example repository:

Now, where's the initial revision of this repository? It's completely circular! You may as well say A is the root, or B, or C. That's the key to lazy fetching, that there is no initial revision to a repository. If you took the current revision from one repository, and the initial revision from another, you could make the former the parent of the latter, simply by computing the changeset from the first revision, to the last one. Suddenly your "initial revision" is not an initial revision. But despite rebasing in this fashion, the contents of the initial revision itself remain unchanged.

If I guess right, mercurial does something like the following:

and calls it a history of 4 changesets.

If you just chop off the former part and have

You can't compute any sort of working directory. You don't have B, which is needed to apply any B-> changesets. And you can't get B since you have no revision A, and A->B.

However, wherever you are fetching from that sends you B->C and B->A, it will either have revision B, or a way to compute it. Nobody needs to store any changesets for which the parent revision is missing. So when you pull instead of sending you A, A->B, B->C and B->A, the remote end can calculate B itself, by applying A->B to A. Then it can send you B, B->C and B->A, and you will be able to apply any of the latter changesets, by the fact you have a pristine copy of B. The remote end could keep B around or not, but the important part is it can send it to you, instead of having to send you only revision A, and all changesets emerging from that.

If revisions are nodes, changesets form a graph connecting the nodes.