Differences between revisions 77 and 104 (spanning 27 versions)
Revision 77 as of 2011-10-15 00:18:16
Size: 13239
Editor: 173-160-136-22-Washington
Comment: Added some clarity around nested subpaths and fixed the paths used in the example and in the text.
Revision 104 as of 2019-05-24 05:05:05
Size: 16986
Comment: fix links to bz
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
<!> This is considered a [[FeaturesOfLastResort|feature of last resort]].
Line 13: Line 15:
For those used to Subversion, this concept is closest to what you can achieve with Subversion directories marked with the `svn:externals` property. Mercurial 1.5 has support for using Subversion repositories as subrepos. Mercurial 1.8 also has support for git. For those used to Subversion, this concept is closest to what you can achieve with Subversion directories marked with the `svn:externals` property. Mercurial 1.5 has support for using Subversion repositories as subrepos. Mercurial 1.8 also has support for Git.
Line 30: Line 32:
On the left hand side of the assignment is the path in our working dir, and the right hand side specifies the source to pull from. This functionality is cross platform so Windows users have to use '`/`' as the path separator. On the left hand side of the assignment is the path in our working dir, and the right hand side specifies the source to pull from.

<!>
This functionality is cross-platform so Windows users have to use '`/`' as the path separator for local directories.
Line 48: Line 52:
Line 58: Line 61:
Line 66: Line 68:
$ echo 'nested = [svn]https://example.com/nested/trunk/path' >.hgsub $ echo "nested = [svn]https://example.com/nested/trunk/path" >.hgsub
Line 71: Line 73:
As of version 1.8, Mercurial supports git subrepositories:

{{{
$ echo 'nested = [git]git://example.com/nested/repo/path.git' > .hgsub
As of version 1.8, Mercurial supports Git subrepositories:

{{{
$ echo "nested = [git]git://example.com/nested/repo/path.git" > .hgsub
Line 79: Line 81:
When we commit, Mercurial will attempt to create a consistent snapshot of the state of the entire project and its subrepos. It does this by first attempting to commit in all modified subrepos and then recording the state of all subrepos. (Commit includes subrepositories by default because it is intended to create an atomic snapshot of the tree that you presumably built and tested before committing.) When we commit, Mercurial will attempt to create a consistent snapshot of the state of the entire project and its subrepos. It does this by recording the state of all subrepos. As of version 2.0, commits are aborted when there are uncommitted changes in subrepos.
Line 85: Line 87:
{i} Subrepo states are stored in a '`.hgsubstate`' file that is managed automatically by Mercurial.

Note that the same commit message will be used recursively over all subrepositories. Note also that `hg status` by default not will recurse into subrepositories, so `hg commit` might commit more than `hg status` shows. Commits with subrepositories must thus be done very carefully, especially in repositories where the subrepos have different policies or where the history of the individual repositories also should make sense in other contexts. Recursive commits can be disabled with the local `ui.commitsubrepos` configuration setting introduced in Mercurial 1.8.
<!> Subrepo states are stored in a '`.hgsubstate`' file that is managed automatically by Mercurial. Do not manually edit this file!
Line 172: Line 172:
{i} One [[http://mercurial.markmail.org/thread/sjqfwk2orrjditc4|workaround]] with Mercurial 2.0 is to use `[subpaths]` in `.hgsub` to map "ideal" paths to the flat namespace used by some hosting providers. For example, a project hosted at `https://bitbucket.org/kiilerix/subrepodemo/` could have a `.hgsub` like this:

{{{
sub = sub
[subpaths]
https://bitbucket\.org/kiilerix/subrepodemo/sub = https://bitbucket.org/kiilerix/subrepodemo-sub
}}}
Similar subpaths magic can be used for pushing to Github with hg-git:

{{{
^git\+https://github\.com/([^/]*)/([^/]*)/([^/]*)$ = git+https://github.com/\1/\3.git
}}}
Line 184: Line 196:
 * Update of parent repo will pull no subrepo changesets (if null revision is indicated) or all changesets (if any other revision is indicated). This [[Bts:issue3489|behavior is by design]].
Line 186: Line 199:
{{{#!wiki note
'''Note'''

This issue has been resolved [[WhatsNew#Mercurial_2.4_.282012-11-1.29|since Mercurial 2.4]]
}}}
Line 195: Line 213:
If you have subrepositories, and a particular update needs to proceed recursively on any of them, the `preupdate` hook will try to execute `./scripts/deploy.sh`. As it is very unlikely to find such path relative to the subrepository root, the hook will fail preventing the update operation to happen. This particular issue is difficult to solve, because some other configuration options should be inherited (see [[http://mercurial.selenic.com/bts/issue2904|issue2904]]). If you have subrepositories, and a particular update needs to proceed recursively on any of them, the `preupdate` hook will try to execute `./scripts/deploy.sh`. As it is very unlikely to find such path relative to the subrepository root, the hook will fail preventing the update operation to happen. This particular issue is difficult to solve, because some other configuration options should be inherited (see Bts:issue2904).
Line 213: Line 231:
== Alternatives ==
If you're looking for package management rather than precise version control, here are some possible alternatives:

 * Python
  * [[https://bitbucket.org/selinc/guestrepo|Guest Repo Extension]]
  * [[http://pypi.python.org/pypi|PyPI]] and [[http://www.pip-installer.org/en/latest/index.html|pip]] ([[http://article.gmane.org/gmane.comp.version-control.mercurial.general/31303|Example]])
  * [[https://bitbucket.org/hstuart/repoman|repoman]]: Mercurial-based repository forest manager
  * Non-Mercurial: [[http://en.wikipedia.org/wiki/Repo_(script)|Repo tool by Google:]] used by the Android project, written in Python, specific to Git
 * PHP
  * [[http://getcomposer.org/|Composer]]
 * Java
  * [[http://maven.apache.org/|Maven]], [[http://ant.apache.org/ivy/|Ivy]], [[http://www.jfrog.com|Artifactory]], etc (package sub-project or dependency as a .jar file)
 * .NET
  * [[http://nuget.codeplex.com|NuGet]], sponsored by Microsoft, can host custom feeds for internal packages
 * All
  * Shell script (or .bat file) at root of repository that contains a series of 'hg clone' calls to get each module. Another script calls 'hg incoming' for each of them to check for updates.
  * Mercurial extension as alternate to shell script / `.bat` file. (A Mercurial extension comes pretty close to being an "hg script file". Written in Python, it will run easily anywhere that Mercurial runs, without any need to install another interpreter.) The extension could include a post-update hook to check for new upstream changes, and a pre-commit hook for local modifications.
Line 216: Line 252:
== How it Works ==
To help explain how this works, think of the top level repository as something that is completely dumb except for two tasks:

 * `.hgsub` stores a path to a subrepository
 * `.hgsubstate` stores a hash of each current subrepository commit

The top level repository "knows" nothing else. All other tasks are accomplished with scripts that look up the path and hash, then go to that specified location/version to look up additional information from the subrepository.

That is why branches are separate between top level and subrepository. The top level has this simple `.hgsub` file storing paths, and a `.hgsubstate` file storing hashes. The top level repository just sees those files the same as any other file in a repository, and you can branch that file like any other file, thus storing different subrepository commit hashes in different top level branches.

Independently, each subrepository contains the full source history and branches of that repository. No subrepository knows anything about the top level, it exists in its own little world.

Any features, tools, or behaviors arise from scripts that perform automated lookups between the parent and subrepository for your convenience. There are not any smarts built into the repositories themselves.

Essentially, you can assume in all cases that neither the parent nor sub repositories know anything about each other. Instead, there are just simple scripts that look up strings in the `.hgsub` and `.hgsubstate` files and pretend that it ties the repositories together, performing automated clones, pulls, and updates for you. This makes it easy to track exact versions of multiple repositories and keep them consistent with each other so that the right version of the parent repository and each subrepository is checked out.
Line 217: Line 269:

Subrepository

<!> This is considered a feature of last resort.

Automatic management of nested repositories from other sources. See also Mercurial's built-in help on subrepos.

1. Introduction

Subrepositories is a feature that allows you to treat a collection of repositories as a group. This will allow you to clone, commit to, push, and pull projects and their associated libraries as a group.

This feature was introduced in a preliminary form in Mercurial 1.3 and has been improved steadily since then. There are still some commands that lack proper support for sub-repositories, but we will fix them as we come across them and as we figure out how to best make them subrepo-aware.

For those used to Subversion, this concept is closest to what you can achieve with Subversion directories marked with the svn:externals property. Mercurial 1.5 has support for using Subversion repositories as subrepos. Mercurial 1.8 also has support for Git.

2. Basic usage

2.1. Start

To start using subrepositories, you need two repositories, a main repo and a nested repo:

$ hg init main
$ cd main
$ hg init nested

Next we'll mark the directory 'nested' as a Mercurial subrepository by creating an entry for it in the special '.hgsub' file.

$ echo nested = nested > .hgsub
$ hg add .hgsub

On the left hand side of the assignment is the path in our working dir, and the right hand side specifies the source to pull from.

<!> This functionality is cross-platform so Windows users have to use '/' as the path separator for local directories.

The source path of a Mercurial repository can either be a relative or absolute path or URL. It is generally recommended to use trivial relative paths where the source path is the same as the working dir path: This will ensure that the subrepositories always can be found 'in place'.

Other relative paths can be used if the subrepositories can't be hosted 'in place', for example because of limitations of a central repository or hosting service. A consequence of using such non-trivial relative paths is that clones can't be cloned. URLs and absolute paths can be useful if the subrepository is hosted centrally without utilizing any DVCS workflows and thus never should be re-cloned. URLs can also be used for external resources, but note that it in that case might be a better idea to use a local distributed mirror and use a trivial relative path instead.

2.1.1. Nested repository subpaths

Note that subpaths can be used to redefine source paths locally.

Here we're going to define that the Mercurial subrepository 'nested' pulls from 'nested/repo/path' using a path relative to main. This says 'anyone who can find our main repo can find the nested repo just by tacking nested onto that path'.

Note that the nested repository must actually exist for the line in .hgsub to do anything. For instance, if rather than creating a local nested repository you attempt to link to a pre-existing remote one, you must ALSO clone that repository:

$ echo nested = nested/repo/path > .hgsub
$ hg add .hgsub
$ hg clone https://example.com/nested/repo/path nested

If you intend to track something other than the current revision of the default branch this is also the time when you would update the subrepo to the desired revision.

Now let's add some files to nested, and add them.

$ echo test > nested/foo
$ hg -R nested add nested/foo
$ hg -R nested commit --message "Initial commit."

2.1.2. Non-Mercurial Subrepositories

As of version 1.5, Mercurial can also support other repository types for your subrepo. The type of the subrepository can be specified in square brackets before the source. The default is '[hg]'.

2.1.2.1. SVN subrepositories

If you want a subrepo that referred to a Subversion repository, you would do something like this:

$ echo "nested = [svn]https://example.com/nested/trunk/path" >.hgsub
$ hg add .hgsub
$ svn co https://example.com/nested/trunk/path nested

2.1.2.2. Git subrepositories

As of version 1.8, Mercurial supports Git subrepositories:

$ echo "nested = [git]git://example.com/nested/repo/path.git" > .hgsub
$ hg add .hgsub
$ git clone git://example.com/nested/repo/path.git nested

2.2. Committing

When we commit, Mercurial will attempt to create a consistent snapshot of the state of the entire project and its subrepos. It does this by recording the state of all subrepos. As of version 2.0, commits are aborted when there are uncommitted changes in subrepos.

$ hg ci -mtest
committing subrepository nested

<!> Subrepo states are stored in a '.hgsubstate' file that is managed automatically by Mercurial. Do not manually edit this file!

2.3. Update

Whenever Mercurial encounters a changeset containing subrepos, it will attempt to pull the specified subrepos and update them to the appropriate state:

$ cd ..
$ hg clone main main2
updating working directory
pulling subrepo nested
requesting all changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cat main2/nested/foo
test

Subrepos may also contain their own subrepos and Mercurial will recurse as necessary.

To do an explicit update from a repository that is not the default one specify the repository's source with --config paths.default=.

2.4. Push

Mercurial will automatically attempt to first push all subrepos of the current repository when you push. This will ensure new changesets in subrepos are available when referenced by top-level repositories.

2.5. Pull

Notably, the 'pull' command is by default not recursive. This is because Mercurial won't know which subrepos are required until an update to a specific changeset is requested. The update will pull the requested subrepositories and changesets on demand. To get pull and update in one step, use 'pull --update'.

Note that this matches exactly how 'pull' works without subrepositories, considering that subrepositories lives in the working directory:

  • 'hg pull' gives you the upstream changesets but doesn't affect your working directory.

  • 'hg update' updates the contents of your working directory (both in the top repo and in all subrepos)

It might be a good idea to always pull with --update if you have any subrepositories. That will generally ensure that updates not will miss any changesets and that update thus not will cause any pulls. If the pull with update fails due to crossing branches then 'hg update' must be used to get all the subrepository updates.

2.6. Synchronizing in subrepositories

Subrepos don't automatically track the latest changeset of their sources. Instead, they are updated to the changeset that corresponds with the changeset checked out in the top-level changeset. This is so developers always get a consistent set of compatible code and libraries when they update.

Thus, updating subrepos is a manual process. Simply run 'hg pull' and 'hg up' in the target subrepo, test in the top-level repo, then commit in the top-level repo to record the new combination. The onsub extension can be used to automate that.

2.7. Delete

To remove a subrepo from the parent repo, you must delete the subrepo definition from the '.hgsub' file at the top level of the parent repo. Once you do this, the subrepo tree will show up as a set of unknown files when you run hg status, and you can delete the files.

3. Recommendations

3.1. Use a thin shell repository to manage your subrepositories

The most obvious way to construct a project using subrepositories is:

project/      # your main project repository
  somelib/    # your shared library as a nested subrepository

This tends to be suboptimal for a variety of reasons:

  • overly-strict tracking of relationship between project/ and somelib/
  • impossible to check or push project/ if somelib/ source repo becomes unavailable
  • lack of well-defined support for recursive diff, log, and status
  • recursive nature of commit surprising

The recommended structure is of this form:

build/      # thin master repo to manage build environment
  project/  # your main project as a subrepo
  somelib/  # your shared library as a sibling subrepo

Here, all repositories containing 'real' code have no subrepositories of their own (ie they are leaf nodes). They can thus be treated as completely ordinary repositories and a developer can largely ignore the additional complexities of subrepositories. Work can continue in these repositories even if their siblings become unavailable. Recursive commits in build/ are only needed to synchronize changes between siblings and to tag releases.

3.2. Use 'trivial' subrepo paths where possible

Mercurial accepts both complex and absolute subrepo paths but these may cause a variety of issues:

  • Absolute URLs are subject to change and may make old versions of the project difficult to reconstruct
  • Relative paths of the form "foo = ../foo" will not generally allow clones to be cloned
  • Paths containing drive letters, UNC paths, backslashes, or other Windows-isms will generally not be portable

The most reliable scheme to have all subrepos paths be of the form:

project = project
somelib = somelib

where the source and target are both the same simple directory name.

{i} On hgweb servers, it will be useful to use symlinks or duplicate path entries to allow shared libraries to appear in multiple places.

{i} One workaround with Mercurial 2.0 is to use [subpaths] in .hgsub to map "ideal" paths to the flat namespace used by some hosting providers. For example, a project hosted at https://bitbucket.org/kiilerix/subrepodemo/ could have a .hgsub like this:

sub = sub
[subpaths]
https://bitbucket\.org/kiilerix/subrepodemo/sub = https://bitbucket.org/kiilerix/subrepodemo-sub

Similar subpaths magic can be used for pushing to Github with hg-git:

^git\+https://github\.com/([^/]*)/([^/]*)/([^/]*)$ = git+https://github.com/\1/\3.git

4. Caveats

As this feature solves a complex problem and is quite young, there are a number of rough edges:

  • Some commands require a -S or --subrepos switch to operate on subrepos (available since Mercurial 1.7)

  • Many commands are not aware of subrepos
  • Update/merge currently can't remove subrepositories entirely as that might lose local-only changes
  • There's no support for merging across renaming/moving subrepos
  • Collisions between normal files and subrepos are not handled
  • Subrepository pulls are always delayed until needed by an update
  • pull -r will not filter revisions pulled in subrepositories

  • Push similarly ignores URLs and revision filters
  • Commit doesn't propagate some flags like -A to subrepos

  • Update of parent repo will pull no subrepo changesets (if null revision is indicated) or all changesets (if any other revision is indicated). This behavior is by design.

4.1. Inherited hooks

Note

This issue has been resolved since Mercurial 2.4

For those operations that imply a subrepository recurson (update, push, commit, etc.), container repository's configuration is inherited and merged with the subrepository's .hg/hgrc. In particular, the [hooks] section is inherited when operating on a particular subrepo. Therefore, if you have hooks configured on the outer repository, they will be executed also for each subrepository.

This situation can lead to trouble if your hook scripts are versioned in the same repository. For instance, you may want to use preupdate and update hooks to implement a deploy mechanism based on Mercurial. A possible configuration for the production repository could be the following:

[hooks]
preupdate = ./scripts/deploy.sh prepare
update = ./scripts/deploy.sh finalize

If you have subrepositories, and a particular update needs to proceed recursively on any of them, the preupdate hook will try to execute ./scripts/deploy.sh. As it is very unlikely to find such path relative to the subrepository root, the hook will fail preventing the update operation to happen. This particular issue is difficult to solve, because some other configuration options should be inherited (see issue2904).

The solution for this is to define empty hooks at each subrepository configuration, like the following:

[hooks]
preupdate =
update =

This way, the inherited configuration is overriden.

5. Tips & tricks

5.1. How to find out what changeset of main repository contains specified changeset of subrepository

Assume you know hash of needed subrepository's changeset: "hash1234blablabla", go to main repository's directory and type:

$ hg grep hash1234blablabla .hgsubstate

6. Alternatives

If you're looking for package management rather than precise version control, here are some possible alternatives:

  • Python
  • PHP
  • Java
  • .NET
    • NuGet, sponsored by Microsoft, can host custom feeds for internal packages

  • All
    • Shell script (or .bat file) at root of repository that contains a series of 'hg clone' calls to get each module. Another script calls 'hg incoming' for each of them to check for updates.
    • Mercurial extension as alternate to shell script / .bat file. (A Mercurial extension comes pretty close to being an "hg script file". Written in Python, it will run easily anywhere that Mercurial runs, without any need to install another interpreter.) The extension could include a post-update hook to check for new upstream changes, and a pre-commit hook for local modifications.

7. See Also

8. How it Works

To help explain how this works, think of the top level repository as something that is completely dumb except for two tasks:

  • .hgsub stores a path to a subrepository

  • .hgsubstate stores a hash of each current subrepository commit

The top level repository "knows" nothing else. All other tasks are accomplished with scripts that look up the path and hash, then go to that specified location/version to look up additional information from the subrepository.

That is why branches are separate between top level and subrepository. The top level has this simple .hgsub file storing paths, and a .hgsubstate file storing hashes. The top level repository just sees those files the same as any other file in a repository, and you can branch that file like any other file, thus storing different subrepository commit hashes in different top level branches.

Independently, each subrepository contains the full source history and branches of that repository. No subrepository knows anything about the top level, it exists in its own little world.

Any features, tools, or behaviors arise from scripts that perform automated lookups between the parent and subrepository for your convenience. There are not any smarts built into the repositories themselves.

Essentially, you can assume in all cases that neither the parent nor sub repositories know anything about each other. Instead, there are just simple scripts that look up strings in the .hgsub and .hgsubstate files and pretend that it ties the repositories together, performing automated clones, pulls, and updates for you. This makes it easy to track exact versions of multiple repositories and keep them consistent with each other so that the right version of the parent repository and each subrepository is checked out.


Other Languages: Français, 中文

Subrepository (last edited 2019-05-24 05:05:05 by AntonShestakov)