Shallow Clones

I (PeterArrenbrecht) am working on adding what I call shallow clones to Hg. These are clones that do not pull the entire history from a server, but only a subset starting at a given revision. The idea is that one often does not need the entire history to work on new features, but only a recent part of it. So shallow clones can save bandwith and disk space.

I know this overlaps with a GSoC project. But the main focus of that project is narrow clones, meaning cloning a subset of files, not a subset of history.

The work is related to TrimmingHistory and OverlayRepository, but the approach is different. In contrast to the punching in TrimmingHistory, I do not want to have to keep the entire index around. Only the part of it that we're interested in. This is done by calling localrepo.changegroup() with a suitable base revision list and a new flag to make it return the initial revisions in full (not as a delta). For the moment I do not attempt to dynamically pull missing data as needed. A shallow clone is a shallow clone.

This means we shall have missing parent revs. These I currently simply set to nullrev. The main problem will then be to ensure we don't bungle merges because of missing revs.

The new aspect with shallow clones is what I call "disconnected" heads. These are heads that would normally be related, but their common ancestor is missing in the shallow clone. I propose that merge no longer accept unrelated heads, that is, heads whose common ancestor is nullrev, unless --force is specified. This ensures we do not accidentally merge disconnected heads, as they will appear unrelated to the shallow clone. If necessary for backwards compatibility, we can make merge behave in this way only for shallow clones. They can be identified by their changelog having shallow revs. We might also issue warnings or abort if the ancestor-detecting code in merge touches shallow revs (I have not experimented with this yet).

It may be necessary to keep the desired base rev used when cloning in .hg/hgrc so subsequent pulls can specify it again. This to avoid pulling undesired history with new heads that reference it.

I'm using bit 0 of the revlog index entry flags to flag shallow revs, that is revs with a missing parent rev. This is currently used by hg verify to flag such revs as errors with a meaningful message, and by revlog.revision() to skip the hash check.

So far, I have a very basic test scenario working: linear history, do a shallow clone, log, update, verify. Other scenarios I am going to test include:

The patch queue (still very much a work in progress) can be obtained from

and is (currently) based on 1603bba96411 from CrewRepository.