Subrepositories

(Translations : French)

This feature was introduced in a preliminary form in Mercurial 1.3 and has been improved steadily since then. There are still some commands that lack proper support for sub-repositories, but we will fix them as we come across them and as we figure out how to best make them subrepo-aware.

Subrepositories is a feature that allows you to treat a collection of repositories as a group. This will allow you to clone, commit to, push, and pull projects and their associated libraries as a group.

For those used to Subversion, this concept is closest to what you can achieve with Subversion directories marked with the svn:externals property. Mercurial 1.5 has support for using Subversion repositories as subrepos.

1. Basic Usage

1.1. Start

To start using subrepositories, you need two repositories, a main repo and a nested repo:

$ hg init main
$ cd main
$ hg init nested
$ echo test > nested/foo
$ hg -R nested add nested/foo
$ hg -R nested commit --message 'Initial commit.'

Now we'll mark nested as a subrepository by creating an entry for it in the special .hgsub file. The first 'nested' is the path in our working dir, and the second is a URL or path to pull from. Here we're simply going to pull from 'nested' using a path relative to main. This says 'anyone who can find our main repo can find the nested repo just by tacking nested onto that path'.

$ echo nested = nested > .hgsub
$ hg add .hgsub

Note that the nested repository must actually exist for the line in .hgsub to do anything. For instance, if rather than creating a local nested repository you attempt to link to a pre-existing remote one, you must ALSO clone that repository:

$ echo nested = https://example.com/nested/repo/path > .hgsub
$ hg add .hgsub
$ hg clone https://example.com/nested/repo/path nested

If you intend to track something other than the current revision of the default branch this is also the time when you would update the subrepo to the desired revision.

As of version 1.5 Mercurial can also support other repository types for your subrepo. For example, if you wanted a subrepo that referred to a Subversion repository, you would do something like this:

$ echo 'nested = [svn]https://example.com/nested/trunk/path' >.hgsub
$ hg add .hgsub
$ svn co https://example.com/nested/trunk/path nested

Currently, Mercurial treats all URLs that do not begin with [<repo type>] as beginning with [hg].

1.2. Committing

When we commit, Mercurial will attempt to recursively commit in all defined subrepos and then record their resulting states in a special .hgsubstate file:

$ hg ci -mtest
committing subrepository nested
$ cat .hgsubstate
3f68b2f93426b6966b604536037b5d325ba00741 nested

1.3. Directory structure

At this point of our example, we have the following directory structure:

  main/
      .hg/
      .hgsub
      .hgsubstate
      nested/
          .hg/
          foo

with .hgsub containing

nested = nested

and .hgsubstate containing

3f68b2f93426b6966b604536037b5d325ba00741 nested

1.4. Update

Whenever newer Mercurial versions encounter this .hgsubstate file when updating your working directory, they'll attempt to pull the specified subrepos and update them to the appropriate state:

$ cd ..
$ hg clone main main2
updating working directory
pulling subrepo nested
requesting all changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cat main2/nested/foo
test

Subrepos may also contain their own subrepos and Mercurial will recurse as necessary.

1.5. Delete

To remove a subrepo from the parent repo, you must delete the subrepo definition from the .hgsub file at the top level of the parent repo. Once you do this, the subrepo tree will show up as a set of unknown files when you run hg status, and you can delete the files from the file system if you like.

2. Caveats

As this is a complex new feature, there are a number of rough edges. Most commands such as diff and status are currently completely unaware of subrepositories. Currently only update, commit, and push are subrepo-aware.

Further, there are a number of behaviors that are currently poorly defined or implemented:

3. Internals

The .hgsub format uses the hgrc config format. It reserves a source prefix of [ for future expansion (see below). Future expansion may also used named sections in this file.

The .hgsubstate format is similar to the tags format, in the form <revision><space><path>. This file is not intended to be hand-edited, but will accept any identifier format that Mercurial accepts. It is also automatically merged when necessary. It is separated from .hgsub to keep automatic updates from muddling that file and to keep .hgsub's history tidy. The combined state can be viewed with hg debugsub.

Internally, subrepo states are represented as a hash of path to (source, revision) pairs that combine the elements of the above two files. There is also a new subrepo object type that exposes a limited set of operations on a subrepo. Subrepos can be traversed like this:

# check whether subrepos are dirty
c = repo['tip']
for s in c.substate:
    subrepo = c.sub[s]
    print s, subrepo.dirty()

4. Wanted Feature per Command

The following repo structure will be used to illustrate the expected behavior of the commands:

A
  B
  C
    D

So A has two subrepos (B and C), and C has D as a subrepo.

4.1. add

Add should behave transparently with respect to subrepos. For example, suppose the user is in repo A and executes this command:

$ hg add

The effect is that files are added in repos A, B, C and D. Suppose the user executes this command when his current directory is A:

$ hg add C/D/foo.c

The effect is that foo.c is added in repo D.

4.2. archive

Archive should archive a revision in the current repo and recurse through subrepos, archiving an appropriate revision in each.

4.3. branch

Current behaviour: A new branch is created in repo A

$ hg branch newBranch

A change is made and committed in repo A. The changeset has been added to newBranch. A change is made and committed in repo B, C or D. The changeset has been added to default branch in the subrepo.

Should the creation of branches in subrepos be a concious decision?

$ hg branch -r newBranch

or should the branch command recusively create branches by default?

Desired behaviour: A new branch is created in repo A

$ hg branch newBranch

A change is made and committed in repo A. The changeset has been added to newBranch. A change is made and committed in repo B, C or D. The changeset has been added to newBranch in the subrepo.

4.4. bundle/unbundle

Bundle should bundle changesets in the current repo and recurse through subrepos, bundling appropriate changesets in each. It is unclear if these bundles should be wrapped up into a super-bundle (a zip- or tar-file?)

Unbundle should handle whatever bundle produces.

4.5. commit

Commit currently recurses by default. Its default behavior should be consistent with the behavior of other commands regarding subrepo handling.

4.6. diff

Diff should recurse through subrepos. It is important that the correct revisions are looked up in each subrepo. For example, one should be able to execute hg diff -r 3 -r 10 in repo A and, according to the .hgsubstates in A and C, appropriate diffs will be done in each of the subrepos.

4.7. export/import

Export should export changesets in the current repo and recurse through subrepos, exporting appropriate changesets from each. Import should handle whatever export produces.

4.8. incoming/outgoing

Incoming and outgoing should recurse into subrepos.

Currently, outgoing's behavior is inconsistent with push: the latter will push changes that outgoing does not show.

4.9. log

The log for subrepos should be interleaved with the log for the parent, according to the .hgsubstate. Options such as "limit" are applied per repo.

4.10. pull

Pull should recurse into subrepos. That is, it should simply do a pull in each subrepo recursively to bring them up to date. This is not to say that they should be updated at the same time. The current mechanism to do that, by updating the parent if fine when one want to update to new revisions of the subrepos based on the state of the parent's .hgsubstate

4.11. push

Push already recurses by default. The behavior should be consistent with other commands with respect to subrepo handling.

4.12. rollback

Rollback should recurse into subrepos. Do the .hgsubstate file need a revert as well since it is modified by commit?

4.13. root

Should not recurse, but can be extended to report the top-level repository, not just the root of the current repositroy. With this extension, aliases for other commands could be easily defined (following the UNIX approach of simplicity):

[alias]
commit-all = !hg -R $(hg root --parent) commit --sub
status-all = !hg -R $(hg root --parent) status --sub
diff-all = !hg -R $(hg root --parent) diff --sub
in-all = !hg -R $(hg root --parent) incoming --sub
out-all = ...

Probably it won't be much more effort to allow the --parent option on the listed commands directly - avoiding the alias.

4.14. status

Status should recurse into subrepos. Subrepos containing uncommitted changes should be listed, and a subrepo that does not have uncommitted changes but is not at the revision specified in its parent repo should also be listed.

For example, suppose the user wants to see the status starting in A. Repo B contains no uncommitted changes but is not at the revision specified in A's .hgsubstate. Repos C and D contain uncommitted changes. The output could look like this

hg status
M B/
M C/
M C/file1.txt
M C/D/
A C/D/file2.txt

4.15. update

Update currently recurses by default. The default behavior should be consistent with other commands with respect to subrepo handling.

5. To Do